k1nq/TREK

mirror of https://github.com/mauriceboe/TREK.git synced 2026-06-30 18:46:00 +00:00

Author	SHA1	Message	Date
Maurice	c3b3c278b8	test(llm-parse): cover the extraction router, client factory and import jobs The new LLM extraction router shipped with little branch coverage, dropping src/nest below the 80% gate. Add unit tests for routeExtraction (flights/single/union/error paths, deterministic booking-wide fill), the native Ollama format client, the provider factory, the local-router service path with its type-aware text cap, the flat->schema.org mapper's remaining reservation types, and the background import-jobs runner. Also remove the now-unused validate.ts (only its FlatLike type was still referenced; moved to flat-schemas).	2026-06-28 11:53:19 +02:00
Maurice	76447f4a73	fix(extract): require the hotel address and ask for the rental company After dropping the vendor templates, the model skipped the (often unlabeled) Expedia-style hotel address — making address a required schema field forces it to emit the street-address line, restoring the booking's location/place. Also hint the rental company so a car booking gets a real title instead of the generic fallback.	2026-06-28 11:53:19 +02:00
Maurice	55ff5c03dd	refactor(extract): drop vendor templates, let the model drive with deterministic backfill Now that a capable instruct model (Qwen3-8B, thinking off) reads name/address/dates/legs reliably across formats, the per-vendor template short-circuit distorted more than it fixed: brittle on layout variations and overriding the better model output. Remove the template layer; the model extracts the structure and Schicht 2 backfills the confirmation/total and takes the currency from the document's own symbol (correcting model misreads like ¥→$). Per-type prompts now also ask for address and price/currency.	2026-06-28 11:53:19 +02:00
Maurice	7bac753ff3	refactor(extract): dedupe currency/day helpers, drop redundant casts, support JPY vouchers Code-audit clean-ups: share one normCurrency between the router and the templates, lift the duplicated nearest-day resolver into formatters.resolveDayId, drop two needless as-unknown-as casts at the fillBookingWideFields call sites, restore routeExtraction's doc comment, and give the broker template readable names. Plus recognise ¥/JPY and fall back to a standalone symbol amount, so a Klook-style voucher whose price sits far from any label still yields a cost.	2026-06-28 11:53:19 +02:00
Maurice	c1d61c98f0	fix(extract): backfill booking code/total and harden the reference match Apply the deterministic confirmation-code and total fill to vendor-template results too (not just model output), and require the captured reference to contain a digit so a bare 'Confirmation'/'Reference' label no longer grabs the next prose word.	2026-06-28 11:53:19 +02:00
Maurice	8f1c99a07a	feat(extract): drive local parsing through a layered extraction router The single-shot prompt was unreliable on multi-leg flights and longer documents, and slow on a CPU host. For the local provider, run a small router instead: - deterministic vendor templates first, with no model call at all - exactly one grammar-enforced call per document via Ollama's native `format` (flights as a flat array of legs, everything else as one flat reservation, the type picked from keywords or a union schema) - booking-wide fields (booking reference, total price, the overnight arrival day) filled deterministically from the text afterwards, and dates coerced to ISO so a natural-language date can't slip through Recommend qwen2.5 in the AI-parsing settings instead of NuExtract.	2026-06-28 11:53:19 +02:00

6 Commits