How it works

The full methodology, in plain language.

A deterministic 13-stage scoring engine. A 2,103-line versioned prompt with 31 sections. Six pre-render questions and a 40-point self-check before a single word reaches the customer. Honest analysis grounded in NHTSA data, reproducible for life. Here is exactly what goes into one Verdict.

13-stage deterministic engineSame vehicle data in → same score out, every time. The math is repeatable.

Audit trail for every VerdictEach Verdict carries a permanent ID. We can reproduce any score we shipped, on demand.

NHTSA-groundedRecalls, complaints, technical service bulletins pulled by VIN.

Shop-calibratedBuilt and iterated by a former independent shop owner — years at the service counter quoting real customers, not a tech-company guess at what shop work looks like.

One-time priceNo subscription, no commission, no affiliate referrals. We have no skin in your repair decision.

Your three steps

1
You tell us about your car
Vehicle, mileage, ownership history, the issue or the whole-vehicle question, the quote you got. About 5–10 minutes — paste a PDF or photos of the inspection if you have one.
2
We run the math
Deterministic 13-stage scoring pipeline. NHTSA recall + complaint pull. Regional market value. Cost-to-value math. Per-platform reliability lookup. Output: a score, three category rollups, and a permanent audit trail.
3
You get a clear answer
Plain language. Recommended action. The math behind it. The dealer-coverage opportunities your shop didn't mention. Delivered to your email when ready.

Architecture

Three layers, three different jobs

A single LLM prompt is not reliable enough for a $129 product that has to stand up to a master mechanic. We split the work across three layers, each with one clear job. The language model only touches the last one — and only after the math is finished and the reasoning is fixed.

Deterministic pipeline

The 13-stage scoring engine. Pure math. Same inputs, same score — every time. Versioned, regression-tested, idempotent. The score is decided here, not by the language model.

Structured reasoning

The six pre-render questions and the ten-step generation protocol. The model thinks through the situation, the dominant story, and its own failure modes — before writing a single customer-facing sentence.

Disciplined prose

The language model renders the Verdict against a 2,103-line prompt with 31 sections. Rules at the top (§0–§0.25) cannot bend. A 40-point self-check runs before output ships.

Why three layers. If the model gets creative on the prose, the score stays correct — the deterministic pipeline already decided it. If the model wants to invent a TSB number, the discipline rules block it before output. The structure removes most ways an “AI” product can quietly mislead a customer.

The prompt

What the language model actually reads

Every Verdict is rendered against a single versioned prompt. Not a paragraph in a system message — a structured document with sections, examples, edge-case handlers, and a self-check. Every rule you see below earned its place by failing in a test run first — each one is the answer to a specific mistake the engine made before the rule was written.

2,103Lines

Versioned in git. Every change carries a changelog header explaining what failed in production and how the fix prevents it.

31Sections

From §0 OUTPUT DISCIPLINE (read first, every time) to §31 mid-generation self-monitoring. Each section governs one specific behavior.

v3.9.2Current revision

Iterated through real-world scenarios with a former shop owner reviewing the output — years of frontline experience checking that every finding lands where a skilled mechanic would land.

prompts/verdict-engine.md — section index (abridged)v3.9.2

§0Output discipline — read first, every time

§0.1Prompt-injection guard

§0.2Factual discipline — no fabrication

§0.25Insufficient-data / out-of-scope protocol

§3Scoring system — philosophy and intent

§4.5Confidence-tier handling per inspection depth

§5Pre-render reasoning — six questions

§5.5Generation protocol — ten steps

§6Output structure (fixed, no deviation)

§8Input fidelity — audit trail is ground truth

§9NHTSA & recall handling

§10Severity language calibration

§11Headline rendering + band table

§14.5What-to-do-next action checklist

§15.6Money-saving coverage hunt

§15.7–11Modification / flood / race / coverage / anomaly handlers

§17Confidence rendering + forward projection

§24Reading the audit trail like an expert

§27Self-critique through three eyes

§30Self-check — 40 mandatory points

§ATop banner — verbatim, every Verdict

§BFinal guardrails — verbatim, every Verdict

Layer 1 — the engine

The 13-stage deterministic pipeline

Pure math. Same vehicle inputs go in, same numeric score comes out — every time, repeatable forever. The language model is not invited until Stage 13 is finished. The score it renders is the score the pipeline produced.

§1–3
Vehicle identity
VIN decoded via NHTSA vPIC. Platform tier, engine code, transmission, in-service date resolved. The platform string (e.g., Toyota 2GR-FKS, Ford Coyote, BMW N63) is what unlocks per-platform calibration downstream.
§4
Component baselines
Per-component reliability baselines from a 40-platform / 93-engine reference table. Toyota's 2GR-FKS V6 starts higher than a first-gen BMW N63 V8 at the same mileage — the math knows.
§5
Failure-mode matching
Vehicle is checked against a 270-pattern database of documented platform issues: Honda 1.5T fuel dilution, GM AFM lifter, BMW N63 oil consumption, Subaru head-gasket era, Toyota oil-gel V6. Symptom-text matching distinguishes "manifested" from "in window only" — the difference matters for the ceiling rules.
§6
Service-history weighting
Documented full service is rewarded (+3 nudge). Documented-with-gaps is rewarded less (+2). Sparse or absent records reduce confidence, never silently change the score.
§7–9
Mileage cluster + climate context
The Mileage Index (MI) compares actual miles to expected for age. Severe-winter climate at high age caps the upper bound — a 200k-mile Buffalo vehicle and a 200k-mile Phoenix vehicle do not deserve the same ceiling.
§10
Aggregation
Weighted sum across components produces the provisional score. The critical-component drag rule prevents healthy peripherals from masking a single failing system — a clean cabin and great tires cannot offset a confirmed catastrophic engine pattern.
§11
Title-status caps
Salvage rebuilt caps at 78 (or 65 if active structural findings). Lemon buyback caps at 60 if any Major+ finding. Junk title at 35. Clean is uncapped.
§12
Floors and ceilings
Six dignity floors (pristine new through baseline-protection 72) lift well-maintained vehicles to fair scores. Eight ceilings (active failure, multi-system concerns, multi-recall, severe corrosion, flood-suspect, modified-with-major, race/track use) cap upside when real concerns exist.
§13
Adversarial pass
Six self-tests catch edge cases the main aggregation can miss: single-finding-dominated deductions, RVR misalignment, platform-reputation safeguards, band-edge enforcement. Anything that fails here is corrected, not rationalized.

Why deterministic. If a master mechanic disputes a Verdict tomorrow, we pull the audit trail and walk them through every input, every adjustment, every floor and ceiling that fired. The score is not an LLM opinion — it is the output of a calculation we can show our work on.

Layer 2 — structured reasoning

Six questions before the first word

§5 of the prompt is a meta-cognition step. Before drafting, the model thinks through six specific questions about the customer, the audit trail, and — critically — its own potential failure modes. None of this thinking appears in the customer's Verdict; it shapes what does.

Q1
What is this customer's situation?
Owner of 8 years vs. buyer at a used lot vs. professional doing diligence. Same data, different angles. The submission's context_view tells the engine which lens to use.
Q2
What does the audit trail actually say?
Not just the score — the calibration that produced it. Which tier floor fired? Which ceiling? Which catastrophic override? The dominant story in the math is what the prose has to honor.
Q3
What would I want to know if this were my car?
Step into the customer's shoes for thirty seconds. The output answers that question directly — it doesn't dance around it.
Q4
What patterns cluster in the trail?
A vehicle with documented service + clean accidents tells one story. Multiple deferred items + ownership churn + rust-belt registration tells another. A platform issue that has been mitigated tells a third — and that's the story to lead with.
Q5
What's the emotional valence?
Relief (Strong/Healthy floor cases). Validation (Sound case with mitigation leading). Concern (Watching). Processing bad news (Major Concerns). The headline tone must match — a relief Verdict that reads cautiously feels like withheld information.
Q6
Where might my own output go wrong?
The metacognitive check. Am I about to manufacture concerns to seem balanced on a clean vehicle? Am I about to bury a documented mitigation under the platform pattern it addresses? Naming the failure mode in advance is how you avoid it.

§5.5 — generation protocol

Ten steps from input to ship

Each step has acceptance criteria. The model does not advance to the next step until the current one passes.

1
Parse the input payload. Read the JSON, the audit trail, and the inspection text. Note every floor, ceiling, and catastrophic override that fired.
2
Run the §5 six questions. Mentally answer each. Do not skip Q6 — your own failure modes.
3
Identify the dominant story in one sentence. Pristine new vehicle? Platform issue fully mitigated? Salvage rebuilt running well? The single-sentence story shapes everything downstream.
4
Draft the headline per §11. Verdict statement leads. Score is supporting. Tone reflects the dominant story.
5
Plan the three-category rollup. If a category lands meaningfully below the headline band, that category is what the customer should focus on.
6
Plan the why-this-score lists. Documented mitigation leads What helped, when present. What concerns is highest-severity-first; if zero, literal text 'No specific concerns flagged' — no padding.
7
Render each section in order: top banner → headline → rollup → why this score → coverage statement → forward projection → RVR if applicable → sub-score detail → sources → freshness → final guardrails. While rendering, run §31 mid-generation monitoring continuously.
8
Self-critique pass per §27 — read the output through three sets of eyes (customer, mechanic, lawyer). Two-words test. One-sentence test. Calibration-honor test.
9
40-point self-check per §30. Any failure → regenerate the offending section.
10
Output only if all 40 points pass and self-critique is clean. Otherwise iterate.

What the score means

The seven bands

Every Verdict lands in one of seven bands. The label is what we say in the headline; the underlying score is on a 0–100 scale. Same labels customer-facing and engine-internal — no translation games.

Score	Band	What it means in plain English
93–100	Strong	Pristine condition. Zero non-trivial findings. The car is doing what it was built to do.
85–92	Healthy	Solid shape with minor wear items. Maintenance, not concern.
76–84	Sound	Good condition, ordinary maintenance items present. Drive with confidence; address items on the next service visit.
68–75	Watching	Real items to monitor. Nothing failing today, but a few things are heading there. Plan a service window.
58–67	Needs Attention	Active concerns the customer should address soon. The math says there is meaningful work pending.
45–57	Significant Concerns	Multiple active concerns or one severe one. The repair conversation is real and budget needs thought.
< 45	Major Concerns	The vehicle has substantial active problems. Significant repair, replacement, or strategic decision is required.

Calibration

Floors and ceilings — the political balance

A score has to be honest enough to stand up to a master mechanic and kind enough not to scare a customer away from a perfectly good car. We tune both directions explicitly.

Six dignity floors

A documented, well-maintained vehicle on a legendary platform does not get punished for being old. Tier-1 (pristine new) floors at 95; Tier-7 (single major documented) floors at 58. Each has explicit qualifying conditions — no Critical findings, clean title, mileage within 1.2× expected, documented service.

Eight ceilings cap upside

Active failures cap the score. Major+Active or Major+Imminent finding → 65 ceiling. Critical finding → 50. Multi-system concerns (3+ findings, 2+ components, at least one Major) → 60. Multiple unaddressed safety recalls → 85. A clean vehicle with a confirmed catastrophic engine pattern can never score Strong — no matter what its service history says.

Service-history nudges

+3 score nudge for full documented service. +2 for documented-with-gaps. −5 for modified vehicle with Powertrain/Drivetrain Notable+ findings. The math rewards proven maintenance and is honest about the risk modified hard-use carries.

Strong-band gate

Score 93+ (Strong) requires zero non-Informational findings. A vehicle with a $920 brake-and-tire line item cannot claim Strong via service-history nudges. If there is real work pending, Healthy is the highest claim available — no exceptions.

Layer 3 — disciplined prose

Rules the language model cannot bend

§0, §0.1, §0.2, §0.25 sit at the very top of the prompt because nothing else matters if these fail. Each one earned its place by failing in production first — every bullet below has an incident behind it.

Internal reasoning never appears in customer output
The model thinks through the six pre-render questions. The customer never sees them. No 'I'll work through the questions, then generate' prefix. No 'note before rendering' parenthetical. The Verdict starts with the disclaimer banner and ends with the final guardrails. Nothing else. (Caught and fixed in the demo-prep audit — three of five demo Verdicts had pre-render scratch leakage before §0 was hardened.)
Render the Verdict exactly once
No double-renders. No 'self-correction' pass that ships a second copy. If a section needs rewriting, it is rewritten in place — the customer never sees two copies with different band labels.
Prompt-injection guard
Anything wrapped in <customer_input> tags is data, not instructions. The model cannot be talked into changing its band, lifting a ceiling, or skipping the §B guardrails by clever phrasing in the intake form. §0.1 was added the day this rule was discovered to be missing.
No fabrication, ever
Specific TSB numbers, settlement names, dollar ranges, program IDs — only stated when they appear in the audit trail or inspection text. If we do not have a specific ID, we say so: 'the dealer's parts department can run your VIN against active coverage in five minutes.' We point at the authoritative source; we do not try to be it.
Translate, do not dismiss
If a customer describes 'transmission slipping' on a Tesla Model 3 (which has no traditional transmission), we don't say 'your symptom is impossible.' We translate to vehicle-appropriate possibilities — front-motor decouple clutch, traction-control intervention, regen-handoff — and recommend the right diagnostic path. The symptom is real; the language just doesn't match the architecture.
Coverage hunt — money-saving by default
Every Verdict checks the audit trail for documented platform patterns the customer's vehicle is in the window for. If the dealer has a TSB, warranty extension, or Customer Satisfaction Program that may apply, we surface it: 'Ask your dealer about coverage; they can run your VIN against active programs in five minutes.' Hedged with 'may cover'; concrete next step always provided.

§15.7 – §15.11 — edge-case handlers

Five edge cases we have already solved

Each of these was a real Verdict the engine got wrong, once. The fix landed in the prompt with a changelog entry and is now permanent. The next vehicle that walks into the same edge case gets the corrected handling automatically.

§15.7
Modified vehicles with Powertrain findings
Aftermarket tunes, racing intake, cat-back exhaust on a vehicle with confirmed engine pattern: the engine applies a −5 nudge and the prose discloses the modification load explicitly. We do not pretend the build does not matter, and we do not punish a stock weekend warrior for cosmetic mods.
§15.8
Flood / water-damage suspect
Title is clean but the audit trail picked up flood-zone registration history or interior moisture markers in the inspection text. The flood-suspect ceiling caps the score and the prose discloses the signal: 'documented flood-zone history; recommend a pre-purchase inspection that explicitly checks electrical connectors and floor-pan undercoating.'
§15.9
Race / track use disclosure
Track-day stickers, racing brake-pad receipts, dyno history: the engine treats this as material context — even when the powertrain is otherwise clean — because hard-use cycles change risk meaningfully. Disclosed in plain language, not hidden in a sub-score.
§15.10
Coverage hunt — standardized format
Every coverage opportunity the engine surfaces follows the same structure: program name, applicable component, what to ask for, who to ask, expected dealer effort. Customers do not have to translate marketing prose into a service-counter conversation — the script is already written.
§15.11
Score-anomaly detection and graceful refusal
If the engine produces a score that doesn't match the dominant evidence in the audit trail — e.g. a Strong band on a vehicle with multiple Major findings — the §15.11 anomaly detector blocks generation and triggers a regeneration with the discrepancy logged. We would rather refuse to render than ship a Verdict the math itself disagrees with.

Reproducibility

Every Verdict is reproducible — for life

Each Verdict carries a permanent audit_trail_id UUID. We store the full input payload, every adjustment, every floor and ceiling that fired, every nudge that applied, and the engine + prompt version that produced it. Years later, if a customer or mechanic disputes a Verdict, we pull the ID and reproduce the exact reasoning that produced the score. Same inputs, same answer — every time.

Most “AI” products cannot do this. The output of a stateless prompt is not reproducible. Ours is — that is the entire point of the deterministic pipeline sitting underneath the model.

Calibration

How it was built and tuned

The engine was built and is maintained by a former independent shop owner — someone who spent years at the service counter quoting real customers and knows what an honest answer sounds like when the conversation gets uncomfortable. The math is calibrated to land where a skilled mechanic would land in the same scenario.

50+Platforms covered

Domestic, import, hybrid, EV — Toyota, Honda, Ford, GM, Subaru, BMW, Ram, Tesla, and others. Per-platform reliability baselines for the engines and transmissions customers actually drive.

270Failure patterns

The pattern database the engine matches against in Stage 5. Each is a documented platform issue with the conditions for 'manifested' vs 'in window only.'

40Self-check points

§30 — mandatory checks the model runs against its own draft before output ships. Any failure → regenerate the offending section.

The regression suite is permanent. Every engine change re-runs the deterministic test suite plus a curated set of audit scenarios. A change that would alter an existing Verdict in a way the shop-owner review doesn't agree with cannot ship.

Changelog

How the prompt got to v3.9.2

Every revision earned its place by failing in production first. Honest changelog.

v3.8.22026-05-08
Output discipline hardening after a demo-prep review caught pre-render scratch leakage on several Verdicts. Added §0 OUTPUT DISCIPLINE, §0.1 PROMPT-INJECTION GUARD, §0.2 FACTUAL DISCIPLINE, §11.5 audit-trail-label tie-breaker.
v3.8.32026-05-08
§0.25 vehicle-symptom-mismatch rewritten — the prior framing dismissed customer symptoms as impossible (e.g. Tesla 'transmission slipping'). The new rule mandates translation into vehicle-appropriate possibilities plus the right diagnostic path. Never dismiss a customer concern.
v3.8.5–v3.8.102026-05-08 → 2026-05-09
Tier-7 single-major-documented floor (58 documented / 60 sparse). !hasCritical gate added to Tier 5 floors. Lemon-buyback ceiling tightened to 60 with any Major+ finding. Auto-detect modifications from inspection text. New ceilings: flood-suspect, sparse-multi-finding, modified-with-major, race-track-use. Title-status alias normalization.
v3.9.02026-05-09
Five prose-discipline sections added (§15.7–§15.11): modification disclosure, flood-suspect prose, race/track use, standardized coverage-hunt format, score-anomaly detection and graceful refusal. Final round: every edge case the shop-owner review surfaced was either handled or explicitly refused-to-render.
v3.9.22026-05-09 (current)
§11.7 bottom-line actionable callout. §14.5 what-to-do-next action checklist. §16.5 cost summary table. Final prose-polish wave for production launch.

What we deliberately do not claim

We are not predicting the future. Cars are mechanical systems with real variance. We tell you what the math says today.
We are not a substitute for a hands-on diagnostic. A qualified mechanic with the car in front of them sees things we cannot.
We are not a legal opinion. If your situation involves a lemon-law dispute, dealer fraud, or an insurance total-loss negotiation, talk to a lawyer.
We are not telling you what to do. We give you the math; you make the call. The Verdict is information you did not have before — what you do with it is yours.
We are not perfect on band edges. Two reasonable mechanics can disagree on whether a 130k vehicle with a verbal repair quote is “Sound” or “Watching.” We are consistent and defensible — not infallible. The score is information; the recommendation is yours.

Why this is independent

Every other voice in your decision is paid by the outcome. The shop is paid to fix things. The dealer is paid to sell you the next car. The forum guy is anonymous. RepairVerdict is the one party with no financial interest in what you decide. We do not sell repairs. We do not refer mechanics. We do not take affiliate fees.

You pay $79 (Quick Verdict) or $129 (Verdict) once. That is our entire revenue from you. Whatever you decide afterward, we are paid the same.

Ready to get yours?

Pick the tier that matches what you have in hand. Both deliver in about 30 minutes by email.

Get a Verdict — $129 Quick Verdict — $79 See a sample first

Your three steps

You tell us about your car

Vehicle, mileage, ownership history, the issue or the whole-vehicle question, the quote you got. About 5–10 minutes — paste a PDF or photos of the inspection if you have one.

We run the math

Deterministic 13-stage scoring pipeline. NHTSA recall + complaint pull. Regional market value. Cost-to-value math. Per-platform reliability lookup. Output: a score, three category rollups, and a permanent audit trail.

You get a clear answer

Plain language. Recommended action. The math behind it. The dealer-coverage opportunities your shop didn't mention. Delivered to your email when ready.

What the language model actually reads

Score

Band

What it means in plain English

93–100

Strong

Pristine condition. Zero non-trivial findings. The car is doing what it was built to do.

85–92

Healthy

Solid shape with minor wear items. Maintenance, not concern.

76–84

Sound

Good condition, ordinary maintenance items present. Drive with confidence; address items on the next service visit.

68–75

Watching

Real items to monitor. Nothing failing today, but a few things are heading there. Plan a service window.

58–67

Needs Attention

Active concerns the customer should address soon. The math says there is meaningful work pending.

45–57

Significant Concerns

Multiple active concerns or one severe one. The repair conversation is real and budget needs thought.

< 45

Major Concerns

The vehicle has substantial active problems. Significant repair, replacement, or strategic decision is required.

How it was built and tuned

What we deliberately do not claim

We are not predicting the future. Cars are mechanical systems with real variance. We tell you what the math says today.

We are not a substitute for a hands-on diagnostic. A qualified mechanic with the car in front of them sees things we cannot.

We are not a legal opinion. If your situation involves a lemon-law dispute, dealer fraud, or an insurance total-loss negotiation, talk to a lawyer.

We are not telling you what to do. We give you the math; you make the call. The Verdict is information you did not have before — what you do with it is yours.

We are not perfect on band edges. Two reasonable mechanics can disagree on whether a 130k vehicle with a verbal repair quote is “Sound” or “Watching.” We are consistent and defensible — not infallible. The score is information; the recommendation is yours.

Why this is independent

You pay $79 (Quick Verdict) or $129 (Verdict) once. That is our entire revenue from you. Whatever you decide afterward, we are paid the same.

Your three steps

You tell us about your car

We run the math

You get a clear answer

Three layers, three different jobs

Deterministic pipeline

Structured reasoning

Disciplined prose

What the language model actually reads

The 13-stage deterministic pipeline

Vehicle identity

Component baselines

Failure-mode matching

Service-history weighting

Mileage cluster + climate context

Aggregation

Title-status caps

Floors and ceilings

Adversarial pass

Six questions before the first word

What is this customer's situation?

What does the audit trail actually say?

What would I want to know if this were my car?

What patterns cluster in the trail?

What's the emotional valence?

Where might my own output go wrong?

Ten steps from input to ship

The seven bands

Floors and ceilings — the political balance

Six dignity floors

Eight ceilings cap upside

Service-history nudges

Strong-band gate

Rules the language model cannot bend

Internal reasoning never appears in customer output

Render the Verdict exactly once

Prompt-injection guard

No fabrication, ever

Translate, do not dismiss

Coverage hunt — money-saving by default

Five edge cases we have already solved

Modified vehicles with Powertrain findings

Flood / water-damage suspect

Race / track use disclosure

Coverage hunt — standardized format

Score-anomaly detection and graceful refusal

Every Verdict is reproducible — for life

How it was built and tuned

How the prompt got to v3.9.2

What we deliberately do not claim

Why this is independent

Ready to get yours?

Your three steps

You tell us about your car

We run the math

You get a clear answer

Three layers, three different jobs

Deterministic pipeline

Structured reasoning

Disciplined prose

What the language model actually reads

The 13-stage deterministic pipeline

Vehicle identity

Component baselines

Failure-mode matching

Service-history weighting

Mileage cluster + climate context

Aggregation

Title-status caps

Floors and ceilings

Adversarial pass

Six questions before the first word

What is this customer's situation?

What does the audit trail actually say?

What would I want to know if this were my car?

What patterns cluster in the trail?

What's the emotional valence?

Where might my own output go wrong?

Ten steps from input to ship

The seven bands