Maker / checker: why two models beat one
A single LLM marking its own homework will always overrate itself. Splitting extraction (Maker) from review (Architect) cuts the error rate by an order of magnitude.
Maker-checker is older than software. Auditors call it segregation of duties; pilots call it cross-check; banks call it dual control. The principle is identical: the person doing the work is not the person verifying the work.
Applied to LLM extraction, the maker is the fast, cheap model that turns the PDF into JSON. The checker — we call it the architect — is the slower, more expensive model that re-reads the same PDF, looks at the maker's JSON, and decides whether the two agree.
Why not one model? Because a single LLM grading its own output is the AI equivalent of asking a student to mark their own exam. It will be too generous on its own mistakes, especially the confident ones. Splitting the roles is the cheapest possible debiasing.
Our pipeline gives the maker three attempts. If the architect rejects on attempt one, the architect's feedback is passed back in and the maker rewrites. By attempt three either the JSON clears the bar or the document is held for human review. In production this resolves roughly 92% of documents on attempt one, 6% on attempt two, and routes the remaining 2% to humans.