This page is for engineers, mechanics, and skeptics who want to understand exactly how a RepairVerdict score is produced. No marketing. Full specification.
A RepairVerdict score is the output of a three-layer system: a deterministic 13-stage scoring pipeline (Layer 1), a structured reasoning protocol (Layer 2), and a disciplined prose renderer (Layer 3). The score is decided entirely in Layer 1 — the language model that renders the prose cannot alter it.
The pipeline is version-controlled and regression-tested. Every score carries a permanent audit_trail_id that stores the full input payload, every adjustment, every floor and ceiling that fired, and the engine version that produced it. Any score from any past Verdict can be reproduced identically.
Every score is derived from specific, verifiable sources. The engine does not rely on general-knowledge inference for any safety-relevant finding.
Owner-submitted complaints indexed by make, model, model year, and component. Used in Stage 5 failure-mode matching to determine whether a known pattern has manifested on the customer's vehicle.
Open and resolved safety recalls by VIN and campaign number. Cross-referenced against the vehicle at scoring time to flag unaddressed recalls, estimate coverage, and apply the multi-recall ceiling when warranted.
Preliminary evaluations and engineering analyses that precede formal recall campaigns. Surfaced as leading indicators for vehicles in the active investigation window — distinguished clearly from confirmed recalls.
Regional fair-market value for the specific trim, mileage, and condition profile. Used in Stage 13 to compute the repair-to-value ratio (RVR) and determine whether total repair cost is within economically defensible range.
Per-platform and per-engine expected-mileage curves built from documented reliability outcomes. Maps actual mileage to a Mileage Index (MI) that adjusts scoring bands fairly across vehicle age and usage patterns.
Documented platform-specific failure patterns with trigger conditions for 'manifested' vs. 'in window only.' Each pattern is linked to the NHTSA complaint or recall data that documents it. Examples: Honda 1.5T fuel dilution, GM AFM lifter, BMW N63 oil consumption, Subaru head-gasket era.
Pure math. Same vehicle inputs produce the same numeric score every time. The language model is not involved until Stage 13 is complete. It renders the score the pipeline produced — it does not produce the score.
VIN decoded via NHTSA vPIC. Platform tier, engine code, transmission type, and in-service date resolved. The platform string (e.g., Toyota 2GR-FKS, Ford Coyote, BMW N63) unlocks per-platform calibration in downstream stages.
Per-component reliability baselines from a 40-platform / 93-engine reference table. Toyota's 2GR-FKS V6 starts higher than a first-generation BMW N63 V8 at identical mileage — the math reflects documented platform outcomes.
Vehicle checked against 730+ documented platform failure patterns. Symptom-text matching distinguishes 'manifested' from 'in window only' — the difference determines whether the pattern is an active finding or a monitoring flag.
Documented full service: +3 nudge. Documented with gaps: +2. Sparse or absent records reduce confidence without silently changing the score — confidence tier is disclosed in the output.
The Mileage Index (MI) compares actual mileage to expected for the vehicle's age. Severe-winter climate at high age applies an upper-bound cap — a 200k-mile Buffalo vehicle and a 200k-mile Phoenix vehicle do not warrant the same ceiling.
Weighted sum across components produces the provisional score. The critical-component drag rule prevents healthy peripherals from masking a single failing system — clean interior and new tires cannot offset a confirmed catastrophic engine pattern.
Salvage rebuilt: 78 maximum (or 65 if active structural findings). Lemon buyback: 60 maximum with any Major+ finding. Junk title: 35 maximum. Clean title is uncapped.
Six dignity floors lift well-maintained vehicles to defensible minimums. Eight ceilings cap upside when real problems exist. Both directions are explicit conditions — no black-box adjustments.
Six self-tests catch edge cases the aggregation can miss: single-finding domination, RVR misalignment, platform-reputation enforcement, band-edge validity. Failures are corrected, not rationalized.
The score is a weighted composite across five factor categories, clamped to [0, 100] before floors and ceilings apply. Exact weights are not published to prevent gaming — but the categories and their roles are:
Line items from the shop's inspection, weighted by severity tier (Critical → Major → Minor → Informational). A single Critical finding applies a hard ceiling regardless of other factor scores.
Open recall count, complaint density for the vehicle's make/model/year, and active ODI investigation status. Multiple unaddressed safety recalls cap the score at 85. A confirmed critical recall pattern caps lower.
Age × mileage compared to expected for the specific platform and engine. Vehicles well below expected mileage for age score higher here; vehicles above expected mileage score proportionally lower, with climate adjustments applied.
Total estimated repair cost divided by regional fair-market vehicle value. RVR > 0.75 suppresses the score regardless of mechanical condition — a $6,000 repair on a $7,500 vehicle is a different decision than on a $22,000 vehicle.
Warranty coverage, active Customer Satisfaction Programs, and documented recall-repair completion reduce the effective repair burden. Confirmed coverage that addresses a Major finding removes its downward weight from the aggregation.
Raw aggregation can produce scores that are technically correct but editorially indefensible. Floors and ceilings are the calibration layer that makes scores meaningful across the full distribution of real vehicles.
All floors require: no Critical findings, clean or rebuilt title within cap, mileage within 1.2× expected for age, and documented service. Tier 7 requires the Major finding to be documented as repaired.
Ceilings are applied after aggregation and override floor claims. A vehicle cannot claim Strong (93+) with any active non-Informational finding — no service-history nudge can override this gate.
The 0–100 numeric score maps to one of seven bands. Band labels are used identically in the customer-facing Verdict and the internal engine — no translation between internal and external representations.
| Score | Band | Technical meaning |
|---|---|---|
| 93–100 | Strong | Zero non-Informational findings. All ceilings clear. All active baseline conditions within expected range. |
| 85–92 | Healthy | Minor wear items present; no active Major or Critical findings. Maintenance-category work only. |
| 76–84 | Sound | Ordinary maintenance deferred or minor service items. No active failure patterns. Confidence in continued operation. |
| 68–75 | Watching | Items trending toward failure. No imminent or active findings, but at least one component requires a service window within 3–6 months. |
| 58–67 | Needs Attention | Active concern(s) present. At least one Minor+Active or Major finding. Repair planning is appropriate now. |
| 45–57 | Repair Window | Multiple active concerns or one severe finding. Repair cost is material relative to vehicle value. Decision-point territory. |
| < 45 | Replacement Window | Critical or multiple Major+Active findings. RVR likely elevated. Substantial repair, replacement, or exit decision warranted. |
Being explicit about scope limits is part of the methodology. Overpromising what a scoring system can deliver is how trust erodes.
The scoring pipeline and the language model prompt are each independently versioned. The current production state is engine v3.8.14 and prompt Engine version 41. Both versions are stored in the audit trail for every Verdict.
A change to the pipeline that would alter an existing Verdict score in a direction the calibration review disagrees with cannot ship. The regression suite re-runs against a curated set of historical audit scenarios on every engine change. The prompt changelog is committed with a header explaining what failed in production and how the fix addresses it — every section earned its place by failing a test first.
The full prompt-section index and a version history of significant revisions are documented on the How it works page.
The 730+ pattern failure-mode database the engine reads from is public — browse the failure-mode encyclopedia for the entry behind every deduction.
The same pipeline described on this page — applied to your specific car, your specific quote.