Use fox for AI summaries with cube as fallback #3
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
AI summaries currently go through a single Ollama backend (cube, qwen3:14b). We want fox (gemma4:26b, served by llama-swap over an OpenAI-compatible API) as the primary summarizer because its output is tighter and at least as fast, while keeping cube as a fallback so summarization keeps working when fox is unreachable.
The wrinkle is that the two speak different protocols: cube uses Ollama-native /api/chat, fox is OpenAI-only /v1/chat/completions (it 404s on /api/chat). So this is not a config-only switch — the summarizer needs an abstraction over both protocols plus per-request fallback that always prefers fox and retries on cube when fox errors.
Scope: introduce an analyzer abstraction with an OpenAI-compatible client and the existing Ollama client, a fox-primary/cube-fallback wrapper, config for both backends, and tests. Out of scope: circuit-breaking/cooldown on fox failure (per-request retry is enough for now), and any change to the summary prompt itself.