Use fox for AI summaries with cube as fallback #4

Merged
arne merged 1 commit from orbit-3-fox-ai-fallback into main 2026-05-30 01:16:49 +02:00
Owner

AI summaries went through a single Ollama backend (cube, qwen3:14b). We want fox (gemma4:26b) as the primary summarizer — its output is tighter and at least as fast in a 4-article comparison — but a single backend means summaries stall whenever that host is down. This makes fox primary and keeps cube as an automatic fallback.

Solution

The two hosts speak different protocols: cube is Ollama-native (/api/chat), while fox is served by llama-swap and only exposes the OpenAI-compatible /v1/chat/completions (it 404s on /api/chat). So a config-only switch isn't possible. This introduces an Analyzer interface with two implementations — OllamaClient and a new OpenAIClient — sharing the Norwegian prompt and JSON-extraction logic so swapping models never changes what we ask for. A FallbackAnalyzer wraps the two: every article tries the primary first and retries on the fallback on any error, so a recovered primary is used again immediately with no cooldown bookkeeping. It does not fall back when the context is cancelled (shutdown).

Backends are configured independently via AI_* and AI_FALLBACK_* env vars (URL, model, protocol, optional Bearer key); defaults are fox-primary / cube-fallback. The HTTP timeout is raised from 30s to 60s — the comparison showed cube cold-loads taking ~48s, which the old timeout would have killed. Verified end-to-end against the live fox and cube backends.

Known cuts

  • Per-request retry only — no circuit-breaker/cooldown, so a hung (not refused) fox costs its full timeout before each fallback. Refused connections fail fast.
  • The summary prompt is unchanged.

Follow-ups

  • Confirm fox/cube are reachable from the deploy host (fismen) over tailscale before this goes live; an older deploy comment noted Ollama wasn't reachable there yet.

Closes #3

AI summaries went through a single Ollama backend (cube, qwen3:14b). We want fox (gemma4:26b) as the primary summarizer — its output is tighter and at least as fast in a 4-article comparison — but a single backend means summaries stall whenever that host is down. This makes fox primary and keeps cube as an automatic fallback. ## Solution The two hosts speak different protocols: cube is Ollama-native (`/api/chat`), while fox is served by llama-swap and only exposes the OpenAI-compatible `/v1/chat/completions` (it 404s on `/api/chat`). So a config-only switch isn't possible. This introduces an `Analyzer` interface with two implementations — `OllamaClient` and a new `OpenAIClient` — sharing the Norwegian prompt and JSON-extraction logic so swapping models never changes what we ask for. A `FallbackAnalyzer` wraps the two: every article tries the primary first and retries on the fallback on any error, so a recovered primary is used again immediately with no cooldown bookkeeping. It does not fall back when the context is cancelled (shutdown). Backends are configured independently via `AI_*` and `AI_FALLBACK_*` env vars (URL, model, protocol, optional Bearer key); defaults are fox-primary / cube-fallback. The HTTP timeout is raised from 30s to 60s — the comparison showed cube cold-loads taking ~48s, which the old timeout would have killed. Verified end-to-end against the live fox and cube backends. ## Known cuts - Per-request retry only — no circuit-breaker/cooldown, so a hung (not refused) fox costs its full timeout before each fallback. Refused connections fail fast. - The summary prompt is unchanged. ## Follow-ups - Confirm fox/cube are reachable from the deploy host (fismen) over tailscale before this goes live; an older deploy comment noted Ollama wasn't reachable there yet. Closes #3
Route article summaries to fox (gemma4:26b, OpenAI-compatible API) as the
primary backend and fall back to cube (qwen3:14b, Ollama-native API) per
request when fox is unreachable. Introduces an Analyzer interface over both
protocols with shared prompt/parse logic and a FallbackAnalyzer wrapper.

Closes #3

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
arne merged commit 36f6fa3f14 into main 2026-05-30 01:16:49 +02:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
arne/news!4
No description provided.