Use fox for AI summaries with cube as fallback #4
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "orbit-3-fox-ai-fallback"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
AI summaries went through a single Ollama backend (cube, qwen3:14b). We want fox (gemma4:26b) as the primary summarizer — its output is tighter and at least as fast in a 4-article comparison — but a single backend means summaries stall whenever that host is down. This makes fox primary and keeps cube as an automatic fallback.
Solution
The two hosts speak different protocols: cube is Ollama-native (
/api/chat), while fox is served by llama-swap and only exposes the OpenAI-compatible/v1/chat/completions(it 404s on/api/chat). So a config-only switch isn't possible. This introduces anAnalyzerinterface with two implementations —OllamaClientand a newOpenAIClient— sharing the Norwegian prompt and JSON-extraction logic so swapping models never changes what we ask for. AFallbackAnalyzerwraps the two: every article tries the primary first and retries on the fallback on any error, so a recovered primary is used again immediately with no cooldown bookkeeping. It does not fall back when the context is cancelled (shutdown).Backends are configured independently via
AI_*andAI_FALLBACK_*env vars (URL, model, protocol, optional Bearer key); defaults are fox-primary / cube-fallback. The HTTP timeout is raised from 30s to 60s — the comparison showed cube cold-loads taking ~48s, which the old timeout would have killed. Verified end-to-end against the live fox and cube backends.Known cuts
Follow-ups
Closes #3