Add fox as primary AI summarizer with cube fallback #6

Closed

arne wants to merge 1 commit from orbit-5-fox-primary-cube-fallback into main

arne commented

2026-05-30 23:59:41 +02:00

Owner

Summaries run through a single Ollama backend (cube/qwen3 via OLLAMA_URL). A single backend means summarization stalls whenever that host is down, and it can't take advantage of fox (gemma4:26b), which produced tighter summaries at comparable latency in a 4-article comparison. This makes fox the primary summarizer with cube as an automatic fallback.

Solution

fox is served by llama-swap and speaks only the OpenAI-compatible /v1/chat/completions API — it 404s on Ollama's /api/chat — so a config-only switch isn't possible. This adds an Analyzer interface with two implementations (OllamaClient, OpenAIClient) sharing the Norwegian prompt and JSON-extraction logic, plus a FallbackAnalyzer that tries the primary first and retries the fallback on any error (but not on context cancellation / shutdown).

To avoid a config trap, the existing OLLAMA_URL/OLLAMA_MODEL keep configuring the fallback backend — so the deployed unit, which already sets OLLAMA_URL=cube, needs no change. A new AI_URL/AI_MODEL/AI_API (default http://fox:11434, gemma4:26b, openai) configures the primary. The Ollama client timeout is raised 30s→60s to tolerate cold model loads (~48s observed).

This re-ports work originally written against an orphaned repo line (old PR #4, preserved on orbit-main-archive) onto the canonical codebase.

Verification

go build + go vet clean; all AI/config tests pass. The 4 pre-existing failures on this branch (snapshot/desk OIDC tests) are unchanged from main — not introduced here. The OpenAI/Ollama client paths were validated live against fox and cube earlier.

Deploy note

Not for deploy until approved. On deploy, summaries switch cube→fox automatically (fox is the default primary); cube remains the fallback via the unit's existing OLLAMA_URL. No unit change required.

Closes #5

Summaries run through a single Ollama backend (cube/qwen3 via `OLLAMA_URL`). A single backend means summarization stalls whenever that host is down, and it can't take advantage of fox (gemma4:26b), which produced tighter summaries at comparable latency in a 4-article comparison. This makes fox the primary summarizer with cube as an automatic fallback. ## Solution fox is served by llama-swap and speaks only the OpenAI-compatible `/v1/chat/completions` API — it 404s on Ollama's `/api/chat` — so a config-only switch isn't possible. This adds an `Analyzer` interface with two implementations (`OllamaClient`, `OpenAIClient`) sharing the Norwegian prompt and JSON-extraction logic, plus a `FallbackAnalyzer` that tries the primary first and retries the fallback on any error (but not on context cancellation / shutdown). To avoid a config trap, the existing `OLLAMA_URL`/`OLLAMA_MODEL` keep configuring the **fallback** backend — so the deployed unit, which already sets `OLLAMA_URL=cube`, needs no change. A new `AI_URL`/`AI_MODEL`/`AI_API` (default `http://fox:11434`, `gemma4:26b`, `openai`) configures the primary. The Ollama client timeout is raised 30s→60s to tolerate cold model loads (~48s observed). This re-ports work originally written against an orphaned repo line (old PR #4, preserved on `orbit-main-archive`) onto the canonical codebase. ## Verification `go build` + `go vet` clean; all AI/config tests pass. The 4 pre-existing failures on this branch (snapshot/desk OIDC tests) are unchanged from `main` — not introduced here. The OpenAI/Ollama client paths were validated live against fox and cube earlier. ## Deploy note Not for deploy until approved. On deploy, summaries switch cube→fox automatically (fox is the default primary); cube remains the fallback via the unit's existing `OLLAMA_URL`. No unit change required. Closes #5

arne added 1 commit

2026-05-30 23:59:41 +02:00

Add fox as primary AI summarizer with cube fallback 8e07258349

Route summaries to fox (gemma4:26b, OpenAI-compatible API) as the primary
backend, falling back to the existing Ollama backend (cube via OLLAMA_URL)
per request when fox is unreachable. Adds an Analyzer interface over both
protocols, an OpenAIClient, and a FallbackAnalyzer; OLLAMA_URL/OLLAMA_MODEL
keep configuring the fallback so the deployed unit needs no change.

Re-ports the AI work from the orphaned repo line (old PR #4, preserved on
orbit-main-archive) onto the canonical codebase.

Closes #5

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

arne commented

2026-05-31 10:14:43 +02:00

Author

Owner

Closing without merge: this work is already on main. The deploy-from-git branch (PR #8) was accidentally branched off this one, so its squash merge (06d456b) bundled the full fox/cube AI feature along with the deploy change. main has the complete, wired feature (build + AI tests pass). Tracked by #5.

Closing without merge: this work is already on `main`. The deploy-from-git branch (PR #8) was accidentally branched off this one, so its squash merge (06d456b) bundled the full fox/cube AI feature along with the deploy change. `main` has the complete, wired feature (build + AI tests pass). Tracked by #5.

arne closed this pull request

2026-05-31 10:14:44 +02:00

arne referenced this pull request

2026-05-31 10:14:44 +02:00

Add fox as primary AI summarizer with cube fallback #5