Files
TREK/server
Maurice d95d26e493 fix(extract): disable model thinking for grammar-constrained extraction
Hybrid/reasoning models (Qwen3 and similar) default to emitting reasoning tokens, which collide with Ollama's format-grammar constraint — on CPU this produced null/unparseable output and blew the latency budget (qwen3:8b: null or 300s timeouts vs ~20s with thinking off). Send think:false on the /api/chat call; Ollama ignores it for non-thinking models (verified on qwen2.5:7b), so it's safe and unlocks the stronger Qwen3 family.
2026-06-28 11:53:19 +02:00
..
2026-06-18 20:13:30 +02:00
2026-05-06 21:38:40 +02:00
2026-06-18 20:13:30 +02:00
2026-06-16 22:22:45 +02:00
2026-06-16 22:22:45 +02:00
2026-06-16 22:22:45 +02:00
2026-06-16 22:22:45 +02:00
2026-06-16 22:22:45 +02:00
2026-06-16 22:22:45 +02:00