back to home
Quality Assurance· May 14, 2026 · 5 min read

Hard gates vs. soft scores: where deterministic code beats LLM judgment

Checksums, totals, and date envelopes are not opinions. Treat them as binary preconditions, not weighted features in a score.

There is a category of correctness check that should never be delegated to a language model: anything with a closed-form, deterministic answer. A NRIC either passes its checksum or it does not. A column of line items either sums to the stated total or it does not. A transaction date either falls inside the report's reporting period or it does not.

Treating these as soft scores in a weighted average is mathematically incoherent — you are letting partial credit on one binary fact cancel a failure on another. Worse, it gives the LLM the option to hand-wave past hard errors with confident prose.

Our hard gates are written in plain TypeScript, run before the architect ever sees the document, and return a single boolean per gate. If any gate fails, the document is hard_rejected and the failure reason is surfaced to the operator. The LLM is never asked for an opinion on something a checksum already settled.

This is not a sophistication argument — it is a reliability argument. The most expensive errors in production are the ones a deterministic check would have caught instantly, lost in a 92-overall score that looked good enough to ship.