perf(extract): cap LLM input at 4000 chars for CPU-only speed

On a GPU-less host the model's prompt-eval time scales with input length and dominates total latency. Booking details sit at the top of a confirmation, so capping the extracted text at 4000 chars (was 8000) roughly halves extraction time (~50s warm for a capable local 7B model) with no loss of fields on real hotel/rental confirmations. Tunable if a long multi-segment itinerary needs more.
2026-06-27 01:01:47 +00:00 · 2026-06-24 22:44:55 +02:00
parent a5d05cb92e
commit 23d5a5bd9c
1 changed files with 1 additions and 1 deletions
@@ -58,7 +58,7 @@ export class LlmParseService {
        // (rental/insurance docs run 30k+ chars) otherwise overflow the model's
        // context window — truncating the *relevant* head — and balloon CPU
        // inference time. Cap the text so only the useful head reaches the LLM.
-        const MAX_EXTRACT_CHARS = 8000;
+        const MAX_EXTRACT_CHARS = 4000;
        if (input.text.length > MAX_EXTRACT_CHARS) input.text = input.text.slice(0, MAX_EXTRACT_CHARS);
        console.debug(`[DEBUG] Extracted text from ${file.originalName} (${input.text.length} chars):\n`, input.text);
        if (!input.text.trim()) {