v1 LIVE · 99.97% uptime

Government PDFs.
Real-time JSON.

Government and statutory PDFs — credit, tax, employment, company registry — turned into structured JSON over a single REST endpoint. Built for the teams that consume this data: banks, credit reporting agencies, fintechs, researchers, and internal tooling.

SOC 2 Type II RBI compliant ISO 27001
POST /api/public/v1/parse
{
  "applicant": {
    "name": "Rohan Mehta",
    "pan": "ABCDE1234F",
    "dob": "1989-04-12"
  },
  "credit": {
    "score": 782,
    "bureau": "CIBIL",
    "report_date": "2026-05-08",
    "accounts": [
      { "lender": "HDFC", "type": "credit_card", "limit": 500000, "utilization": 0.18 },
      { "lender": "SBI",  "type": "home_loan",   "outstanding": 4280000, "emi": 38200 }
    ],
    "defaults": []
  },
  "employment": { "employer": "Acme Pvt Ltd", "income": 2400000, "tenure_months": 54 }
}
10 KPIs

every JSON graded by an architect model before it touches the database

3× retry

maker model rewrites until the architect approves — or the doc is rejected

1 API

same REST endpoint for every source, every country, every consumer

One API for every government and statutory PDF.

Stop building scrapers. Vertitos ingests the PDFs issued by statutory authorities and regulators, and returns clean JSON — each source typed, namespaced, and traceable to the original document.

Malaysia coverage

CCRIS (BNM), EPF/KWSP, LHDN, SSM. Statutory and regulator-issued PDFs, parsed to source-specific schemas.

Indonesia coverage

SLIK OJK, BPJS, DJP. Statutory and regulator-issued PDFs, parsed to source-specific schemas.

Audit-grade

Every request hashed and logged. Field-level confidence scores. Replay any call.

Multi-page, multi-table

Handles 200-page reports, nested account tables, scanned filings, the lot.

Strict schemas

Validated outputs. If a field is missing, you know — not a hallucinated value.

Zero infra

REST + bearer key. No SDK lock-in, no queues to manage.

LENDING-GRADE QA · v2

Hard gates first. Two models second. Three outcomes. No grey area.

Every PDF runs deterministic hard gates (identifier regex, totals, date envelopes) before any LLM scores it. A Maker extracts. An independent Verifier re-reads the PDF in parallel. An Architect adjudicates against 10 KPIs with per-source thresholds. Each document lands in exactly one of three buckets.

01
Hard gates

Deterministic regex, checksum, sum-of-parts, date-window checks. Failures short-circuit before any model spend.

02
Maker

Gemini 2.5 Flash extracts. Cheap, fast, structured output.

03
Verifier

Independent model re-reads the same PDF. Critical-field disagreement forces review.

04
Architect

Gemini 2.5 Pro scores 10 KPIs, applies per-source thresholds, routes the document.

auto_pass
Auto-pass

All hard gates green, KPIs above source threshold, Verifier agrees on critical fields. Writes straight to the database.

review_queue
Human review

Gates pass but KPIs sit between source minimum and target, or Verifier disagrees on a non-critical field. Operator adjudicates in the review queue.

hard_reject
Hard reject

A hard gate fails, no strong identifier is captured, or a critical KPI scores below the floor. JSON withheld. Audit row written.

The 10 KPIs the Architect scores

Critical KPIs are non-negotiable: any failure routes the document to hard reject, regardless of overall score.

schema_validcritical

Parses as JSON. Required top-level keys (identifiers, subject_kind, display_name) all present.

field_completeness

Percentage of visible PDF fields actually captured in the output.

identifier_validitycritical

IDs match the regex for their type — NRIC, NIK, NPWP, SSM, PAN, passport.

type_fidelity

Numbers as numbers, dates as ISO YYYY-MM-DD, booleans as booleans. No stringified anything.

key_hygiene

snake_case keys, no duplicates, consistent nesting depth across records.

source_alignment

Fields match what the source type should contain (CCRIS facilities, EPF contribution history, SSM directors, etc.).

no_hallucinationcritical

Every value in the JSON is traceable back to text in the PDF.

unit_normalization

Amounts carry currency codes; all units consistent within the document.

internal_consistencycritical

Totals equal the sum of line items. Dates fall inside the document's reporting period.

identifier_presencecritical

At least one strong identifier captured so the record can be matched to a subject.

Thresholds are tuned per source type

Mistakes on CCRIS move loan decisions. Mistakes on SSM rarely do. Thresholds reflect that asymmetry and are recalibrated weekly against a human-labeled gold set.

SourceIssuerOverall minCritical min
CCRISBNM (MY)95%99.5%
SLIKOJK (ID)95%99.5%
EPF / KWSPKWSP (MY)92%97%
LHDNLHDN (MY)92%97%
DJPDJP (ID)92%97%
SSMSSM (MY)90%97%
BPJSBPJS (ID)90%97%
Max maker retries: 3
Verifier model: independent of maker
Audit: every maker, verifier, architect output stored, immutable
Hard reject: JSON withheld, never written

One database. One API. Yours to use.

Free tier includes 100 documents/month. No credit card. Production-grade from request one.

Get your API key