How We Score Apps: The BAR Score Rubric
Last updated April 21, 2026 · Authored by Quincy Halverson, MS
This page is the working rubric every Best App Rankings leaderboard, single-app review, and head-to-head comparison is built against. We publish the rubric in full because a 100-point score is only as defensible as the procedure that produced it. If you want to know why we ranked one app ahead of another on a leaderboard, this document should answer the question.
Every app on this site is scored against five weighted criteria. The weights are fixed across all eight categories — calorie, nutrition, fitness, sleep, AI, wellness, productivity, finance — so scores remain comparable across the leaderboards. The weights are reviewed annually by Quincy; the next scheduled review is September 2026.
The 100-Point Rubric
| Criterion | Weight | What we measure |
|---|---|---|
| Accuracy | 30% | How well the app does the thing it claims to do, measured against an external benchmark where one is available (USDA reference values for calorie trackers, sleep-lab polysomnography for sleep apps, validated assessment instruments for wellness apps). |
| Features | 25% | Depth of capability vs the category baseline. Database breadth, integration count, niche features, premium feature parity. |
| UX | 20% | Speed of common workflows, friction-of-correction, accessibility, dark patterns. Apps that interrupt the core workflow with upgrade prompts more than once per session lose points. |
| Price | 15% | Annual cost in USD at the most-common upgrade tier, normalized against feature parity. We compute "dollars per usable feature" rather than scoring on headline cost. |
| Support | 10% | Customer support responsiveness, documentation depth, community ecosystem, vendor stability. |
The composite is the weighted sum, rounded to one decimal. Each criterion is scored 0–100 by Tamsin or the relevant category specialist, with Quincy verifying inter-rater agreement on a random 30% sample. We do not curve-grade across rankings.
How We Score Accuracy
Accuracy is the highest-weighted criterion because every other claim depends on it. An app with the cleanest UX in its category is not useful if its core data is wrong. Where an external benchmark exists, we anchor accuracy to it:
- Calorie trackers: Mean absolute percentage error (MAPE) against USDA-weighed reference meals. The Dietary Assessment Initiative's March 2026 six-app validation study (240 weighed meals) is the primary source.
- Sleep apps: Concordance with polysomnography on sleep-stage classification, where third-party sleep-lab validation studies exist.
- Fitness apps: Concordance with chest-strap heart rate monitors and validated activity trackers on the relevant metrics.
- AI apps: Task-specific benchmarks where they exist; reproducibility across multiple test runs.
- Wellness, productivity, finance: Where no external benchmark exists, we score on data-quality dimensions: source verification, freshness, error rate on a sampled audit.
The accuracy sub-score is anchored to the benchmark performance. For calorie trackers: 100 − (MAPE × 4), capped at 100, floored at 0. ±5% MAPE earns 80 points; ±15% earns 40; ±25% or worse earns zero. The same proportional anchoring is used in the other categories with category-appropriate scaling.
How We Score Features
Features is scored on depth-of-capability against the category baseline. We define a category-specific feature inventory (database size for calorie, nutrient panel breadth for nutrition, integration count for fitness, sleep-stage detection granularity for sleep, etc.) and score apps on coverage, depth, and parity-with-best-in-class.
Feature parity scoring penalizes apps that lock high-frequency features behind premium tiers when category competitors include them on the free tier. It rewards apps that include niche but high-value features (clinician-facing reports, GLP-1 protein floors, heart-rate-zone training, etc.) when competitors do not.
How We Score UX
UX is scored on speed of the four most-common category workflows, friction-of-correction (taps required to fix a mis-logged item), accessibility (VoiceOver/TalkBack support, font scaling, WCAG 2.2 AA color contrast), and absence of dark patterns. Apps that interrupt the core workflow with upgrade prompts more than once per session lose points. Apps that hide cancel buttons on subscription paywalls lose points.
How We Score Price
We compute the annual cost in USD at the most-common upgrade tier and divide by the count of category-relevant features actually delivered. The resulting "dollars per usable feature" is the basis for the price sub-score.
We deliberately do not score "free" apps as 100 on price. A free app with an ad-loaded UX and a thin database is not actually free; it is paid for in time and accuracy. The price sub-score reflects value, not headline cost.
How We Score Support
Support is scored on response time to a tester-submitted support ticket, depth of the published documentation, community ecosystem (active user forums, recipe/template sharing where relevant), and vendor stability indicators (release cadence, financial-stability signal where public).
Test Cadence
Apps move. Pricing changes; databases improve; AI models get retrained. Our re-test schedule:
- Top-3 apps in any active leaderboard: re-tested quarterly.
- Apps ranked 4 through 10: re-tested every six months.
- Single-app reviews not in a current leaderboard: re-tested every 12 months at minimum.
- Vendor-announced major release (a new AI model rollout, a database overhaul): triggers an out-of-cycle re-test within 30 days.
Every page on the site carries a "last updated" date in the byline. If you see a date older than the cadence above, please contact us; we treat lapses as a quality issue.
Quality Control
Every ranked piece on Best App Rankings carries a sign-off from at least two of three named contributors. Tamsin runs the daily-use protocol and drafts the leaderboard. Quincy verifies the rubric application and signs off on the score. Dr. Iwasaki-Trent reviews the medical framing on any leaderboard, review, or comparison that touches clinical territory.
A piece does not ship until all required sign-offs are reflected in the published version. Citations are independently verified before publication; every numerical claim must trace to a primary source, and unsupported claims are removed.
Conflict of Interest Policy
Best App Rankings does not currently maintain affiliate accounts with any of the apps reviewed on the site. We have not been offered, nor have we accepted, any compensation in exchange for placement, ranking, or favorable framing. None of our editorial team members hold equity, advisory roles, or paid relationships with any of the apps on a BAR leaderboard.
If we adopt affiliate links in the future for a subset of apps, we will disclose it in real time on the relevant page and in our footer. We will not silently switch revenue models.
Questions About This Methodology
Questions, corrections, or proposed methodological refinements should go to editor@bestapprankings.com. We treat reasoned methodological criticism as a contribution to the rubric and credit external contributors when their suggestion is adopted.