Use fox for AI summaries with cube as fallback

arne commented

2026-05-30 01:15:14 +02:00

Owner

AI summaries went through a single Ollama backend (cube, qwen3:14b). We want fox (gemma4:26b) as the primary summarizer — its output is tighter and at least as fast in a 4-article comparison — but a single backend means summaries stall whenever that host is down. This makes fox primary and keeps cube as an automatic fallback.

Solution

The two hosts speak different protocols: cube is Ollama-native (/api/chat), while fox is served by llama-swap and only exposes the OpenAI-compatible /v1/chat/completions (it 404s on /api/chat). So a config-only switch isn't possible. This introduces an Analyzer interface with two implementations — OllamaClient and a new OpenAIClient — sharing the Norwegian prompt and JSON-extraction logic so swapping models never changes what we ask for. A FallbackAnalyzer wraps the two: every article tries the primary first and retries on the fallback on any error, so a recovered primary is used again immediately with no cooldown bookkeeping. It does not fall back when the context is cancelled (shutdown).

Backends are configured independently via AI_* and AI_FALLBACK_* env vars (URL, model, protocol, optional Bearer key); defaults are fox-primary / cube-fallback. The HTTP timeout is raised from 30s to 60s — the comparison showed cube cold-loads taking ~48s, which the old timeout would have killed. Verified end-to-end against the live fox and cube backends.

Known cuts

Per-request retry only — no circuit-breaker/cooldown, so a hung (not refused) fox costs its full timeout before each fallback. Refused connections fail fast.
The summary prompt is unchanged.

Follow-ups

Confirm fox/cube are reachable from the deploy host (fismen) over tailscale before this goes live; an older deploy comment noted Ollama wasn't reachable there yet.

Closes #3

AI summaries went through a single Ollama backend (cube, qwen3:14b). We want fox (gemma4:26b) as the primary summarizer — its output is tighter and at least as fast in a 4-article comparison — but a single backend means summaries stall whenever that host is down. This makes fox primary and keeps cube as an automatic fallback. ## Solution The two hosts speak different protocols: cube is Ollama-native (`/api/chat`), while fox is served by llama-swap and only exposes the OpenAI-compatible `/v1/chat/completions` (it 404s on `/api/chat`). So a config-only switch isn't possible. This introduces an `Analyzer` interface with two implementations — `OllamaClient` and a new `OpenAIClient` — sharing the Norwegian prompt and JSON-extraction logic so swapping models never changes what we ask for. A `FallbackAnalyzer` wraps the two: every article tries the primary first and retries on the fallback on any error, so a recovered primary is used again immediately with no cooldown bookkeeping. It does not fall back when the context is cancelled (shutdown). Backends are configured independently via `AI_*` and `AI_FALLBACK_*` env vars (URL, model, protocol, optional Bearer key); defaults are fox-primary / cube-fallback. The HTTP timeout is raised from 30s to 60s — the comparison showed cube cold-loads taking ~48s, which the old timeout would have killed. Verified end-to-end against the live fox and cube backends. ## Known cuts - Per-request retry only — no circuit-breaker/cooldown, so a hung (not refused) fox costs its full timeout before each fallback. Refused connections fail fast. - The summary prompt is unchanged. ## Follow-ups - Confirm fox/cube are reachable from the deploy host (fismen) over tailscale before this goes live; an older deploy comment noted Ollama wasn't reachable there yet. Closes #3

arne added 1 commit

2026-05-30 01:15:14 +02:00

Use fox for AI summaries with cube as fallback 87883d653e

Route article summaries to fox (gemma4:26b, OpenAI-compatible API) as the
primary backend and fall back to cube (qwen3:14b, Ollama-native API) per
request when fox is unreachable. Introduces an Analyzer interface over both
protocols with shared prompt/parse logic and a FallbackAnalyzer wrapper.

Closes #3

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

arne merged commit 36f6fa3f14 into main

2026-05-30 01:16:49 +02:00

arne referenced this pull request from a commit

2026-05-30 01:16:51 +02:00

Use fox for AI summaries with cube as fallback (#4)

arne referenced this pull request

2026-05-30 23:56:08 +02:00

Add fox as primary AI summarizer with cube fallback #5

arne referenced this pull request from a commit

2026-05-30 23:59:31 +02:00

Add fox as primary AI summarizer with cube fallback

arne referenced this pull request