Production migration: arne/marcus/tarald → one container #3
Labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
posta/server#3
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What to build
One-time operational cutover: collapse
arne-msg,marcus-msg,tarald-msgcontainers on fismen into a single newpostacontainer running the multi-tenant daemon.This is HITL — operator drives, agent assists. The procedure:
postaIncus container (Alpine 3.21, default profile)posta-serverbinary as/usr/local/bin/posta-server/etc/posta/identities.toml,/var/lib/posta/{arne,marcus,tarald}/directoriesservice posta stopper container)keys.json+inbox.dbfrom each old container to the new container's per-identity directory/etc/init.d/posta(no--url/--keys/--dbflags)arne.posta.no,marcus.posta.no,tarald.posta.noblocks to point at the new container's IP;caddy validate+ reloadOutage window: ~1–3 minutes between step 4 and step 8. Wire-side peers retry on transient failures.
Acceptance criteria
postacontainer running on fismen with multi-tenant binarylocalhostAND via Caddyarne-msg,marcus-msg,tarald-msg) stopped but not deletedBlocked by
Ready-for-Human Brief
Category: enhancement
Summary: One-time operational cutover collapsing the three legacy single-tenant containers (
arne-msg,marcus-msg,tarald-msg) on fismen into a single newpostacontainer running the multi-tenant daemon.Why this is human-only:
The procedure mutates live production. Specifically: it stops three live daemons, copies SQLite databases and Ed25519 keys between Incus containers, writes a new OpenRC init script, edits the Caddyfile, and reloads Caddy with a 1–3 minute public outage. Each step needs a real human to verify the previous step landed cleanly before continuing — these are judgment calls about live state (is the snapshot current, did the smoke test really pass, did Caddy actually reload) that an AFK agent cannot make safely. External access is required (fismen, Incus, the Caddyfile) and rollback decisions depend on what the operator sees.
Blocked by: #1 (multi-tenant daemon must exist) and ideally #2 (so the operator can use the
identityCLI as a sanity check post-migration, even though manual file placement also works).Cutover runbook:
postaon fismen — Alpine 3.21, default profile.posta-serverbinary from the latestmainand push it to/usr/local/bin/posta-serverin the new container./etc/posta/identities.tomlwith three[[identity]]entries (arne, marcus, tarald) using their existing canonical URLs./var/lib/posta/{arne,marcus,tarald}/directories with appropriate ownership/permissions.service posta stopinside each ofarne-msg,marcus-msg,tarald-msg.keys.jsonandinbox.dbfrom each old container to the new container's matching per-identity directory. Verify file sizes and permissions after each copy./etc/init.d/postain thepostacontainer. The new init script must invokeposta-server servewith no--url/--keys/--db/--nameflags — only--listenand--manifest(per #1).localhostinside the container (one curl per identity to its declared URL via/etc/hostsaliases orHost:header).arne.posta.no,marcus.posta.no,tarald.posta.nosite blocks to point at the new container's IP. Runcaddy validate, then reload./api/v1/*call per identity using existing tokens.Rollback procedure (if any step 7–9 fails):
caddy reload.service posta startin each old container.Acceptance criteria:
postacontainer is running on fismen with the multi-tenant binary.localhostand via the public Caddy URL.arne-msg,marcus-msg,tarald-msg) are stopped but not deleted.DEPLOY.mdor a runbook in the repo) so a future operator can reverse the cutover without re-deriving the steps.Out of scope:
posta-server. This issue is a deploy/migration; code work happens in #1 and #2.identity addflow afterwards.Cutover complete — 2026-05-10
Executed the runbook above. Outage window ~2 minutes; resolved cleanly.
What's running now:
postacontainer on fismen at10.228.107.168, multi-tenant daemon serving all three identities by Host dispatch9a656c6on themulti-tenant-rolloutbranch (PR #7, not yet merged at cutover time — the deployed binary tracks that branch HEAD)/etc/posta/identities.tomllistsarne,marcus,tarald/var/lib/posta/<slug>/{keys.json,inbox.db}— keys + DB copied from each legacy container, schema migrated v2 → v3 on first open/etc/init.d/postainvokingposta-server serve --listen 0.0.0.0:80 --manifest /etc/posta/identities.tomlPublic smoke:
https://arne.posta.no/→ 200, actor doc withname=Arne, original keyq1lr+YxzHV…https://marcus.posta.no/→ 200, actor doc with original keypKr0g7IXW9…https://tarald.posta.no/→ 200, actor doc withname=Tarald, original keyMuKdTmgft…https://arne.posta.no/api/v1/identity(no bearer) → 401 (auth path intact)https://arne.posta.no/setup→ 200 (HTML pairing page served)https://arne.posta.no/api/v1/invite/info?invite=pinv_bogus→ 410 (invite path live)Token survival: existing rows in
auth_tokenscarried over per-DB.token list --slug=arneconfirmstui-devstill active with last-seen 2026-05-09 20:16 UTC, so devices that hold pre-cutover bearers keep authenticating.Caddyfile change: three
reverse_proxyIPs swapped from10.228.107.{201,229,205}:80to10.228.107.168:80. Backup at/etc/caddy/Caddyfile.bak.posta-multitenant-cutover-20260510-004551.Rollback artifacts:
arne-msg,marcus-msg,tarald-msgstopped but not deleted; their data dirs intact. Rollback procedure documented inDEPLOY.md(flip Caddyfile back from the timestamped backup, restart the three legacy services).Acceptance criteria:
postacontainer running on fismen with multi-tenant binarylast_seen_at)#6 (decommission legacy containers) is now unblocked, but the brief specifies a ≥7-day stability window before running
incus delete. Safe-to-run from 2026-05-17 at the earliest, pending a clean log review.