The 10 KPIs we grade every document against
Schema validity, hallucination, identifier presence, numeric consistency. The full list of accuracy KPIs an architect model checks before any JSON is written.
A score is only meaningful if you know what is being scored. Our architect model — the second, stricter pass that re-reads the PDF after the maker has extracted it — grades against ten named KPIs, three of which are flagged critical.
The critical three are schema_valid (the output parses and the required top-level keys are present), no_hallucination (every value is traceable to text in the PDF), and identifier_presence (at least one strong identifier so the record can be matched to a subject). Failing any of these is non-negotiable.
The remaining seven — field_completeness, identifier_validity, type_fidelity, key_hygiene, source_alignment, unit_normalization, internal_consistency — are graded 0 to 100. An overall score of 85 with every critical KPI at 90+ was the v1 approval threshold.
That threshold is fine for a demo. It is not fine for a bank underwriting loans on the output, which is why we have since moved to the lending-grade tiered framework documented in a separate post.