The new LLM extraction router shipped with little branch coverage, dropping src/nest below the 80% gate. Add unit tests for routeExtraction (flights/single/union/error paths, deterministic booking-wide fill), the native Ollama format client, the provider factory, the local-router service path with its type-aware text cap, the flat->schema.org mapper's remaining reservation types, and the background import-jobs runner. Also remove the now-unused validate.ts (only its FlatLike type was still referenced; moved to flat-schemas).
After dropping the vendor templates, the model skipped the (often unlabeled) Expedia-style hotel address — making address a required schema field forces it to emit the street-address line, restoring the booking's location/place. Also hint the rental company so a car booking gets a real title instead of the generic fallback.
Now that a capable instruct model (Qwen3-8B, thinking off) reads name/address/dates/legs reliably across formats, the per-vendor template short-circuit distorted more than it fixed: brittle on layout variations and overriding the better model output. Remove the template layer; the model extracts the structure and Schicht 2 backfills the confirmation/total and takes the currency from the document's own symbol (correcting model misreads like ¥→$). Per-type prompts now also ask for address and price/currency.
Code-audit clean-ups: share one normCurrency between the router and the templates, lift the duplicated nearest-day resolver into formatters.resolveDayId, drop two needless as-unknown-as casts at the fillBookingWideFields call sites, restore routeExtraction's doc comment, and give the broker template readable names. Plus recognise ¥/JPY and fall back to a standalone symbol amount, so a Klook-style voucher whose price sits far from any label still yields a cost.
Apply the deterministic confirmation-code and total fill to vendor-template results too (not just model output), and require the captured reference to contain a digit so a bare 'Confirmation'/'Reference' label no longer grabs the next prose word.
The single-shot prompt was unreliable on multi-leg flights and longer
documents, and slow on a CPU host. For the local provider, run a small
router instead:
- deterministic vendor templates first, with no model call at all
- exactly one grammar-enforced call per document via Ollama's native
`format` (flights as a flat array of legs, everything else as one flat
reservation, the type picked from keywords or a union schema)
- booking-wide fields (booking reference, total price, the overnight
arrival day) filled deterministically from the text afterwards, and
dates coerced to ISO so a natural-language date can't slip through
Recommend qwen2.5 in the AI-parsing settings instead of NuExtract.