How we think about value, maker-checker, and what it took to move from demo-grade to lending-grade quality.
Every lender, fintech, and researcher rebuilds the same brittle PDF pipeline. We did it once, behind a single REST endpoint — here's the value proposition in numbers.
Schema validity, hallucination, identifier presence, numeric consistency. The full list of accuracy KPIs an architect model checks before any JSON is written.
A single LLM marking its own homework will always overrate itself. Splitting extraction (Maker) from review (Architect) cuts the error rate by an order of magnitude.
85% overall is a demo bar. For a bank issuing millions in loans we replaced the single threshold with hard gates, three routing outcomes, and per-field confidence.
Checksums, totals, and date envelopes are not opinions. Treat them as binary preconditions, not weighted features in a score.
A short, honest account of the things we had to add to be defensible in a room with bank risk officers — and what we still have ahead of us.