Garbage-collect unreferenced uploads #25

Open
opened 2026-05-13 21:38:44 +02:00 by arne · 0 comments
Owner

Why

`uploadstore` (#22) is append-only by design — every successful
`POST /api/v1/uploads` adds a content-addressed file and nothing ever
removes it. Combined with the lack of quota (filed separately), this
means a client that uploads an image, decides not to send it, never
references it again, still pays disk cost forever.

Scope

A periodic (or on-demand) GC pass that:

  1. Lists files under each identity's directory matching
    `upload-.`.
  2. Computes the set of upload URLs referenced by a stored message
    payload — both outbound (sent by this identity) and inbound (sent
    to this identity by a peer, though those URLs typically point at
    the peer's host, not this one).
  3. Deletes files whose URL appears in no stored payload AND whose
    ctime/mtime is older than a grace period (e.g. 7 days, so the
    `POST /uploads` → `POST /messages` window doesn't race the GC).

Cross-identity uploads are out of scope: each identity owns its own
upload directory; GC operates per-identity.

Acceptance

  • A CLI subcommand `posta-server uploads gc [--dry-run] [--slug X]`
    that prints the planned deletions and (without dry-run) executes
    them.
  • Unit test covering the reference-detection logic with a mix of
    text/v1 and link/v1 stored rows.

Related: #22.

### Why \`uploadstore\` (#22) is append-only by design — every successful \`POST /api/v1/uploads\` adds a content-addressed file and nothing ever removes it. Combined with the lack of quota (filed separately), this means a client that uploads an image, decides not to send it, never references it again, still pays disk cost forever. ### Scope A periodic (or on-demand) GC pass that: 1. Lists files under each identity's directory matching \`upload-<hex>.<ext>\`. 2. Computes the set of upload URLs *referenced* by a stored message payload — both outbound (sent by this identity) and inbound (sent *to* this identity by a peer, though those URLs typically point at the peer's host, not this one). 3. Deletes files whose URL appears in no stored payload AND whose ctime/mtime is older than a grace period (e.g. 7 days, so the \`POST /uploads\` → \`POST /messages\` window doesn't race the GC). Cross-identity uploads are out of scope: each identity owns its own upload directory; GC operates per-identity. ### Acceptance - A CLI subcommand \`posta-server uploads gc [--dry-run] [--slug X]\` that prints the planned deletions and (without dry-run) executes them. - Unit test covering the reference-detection logic with a mix of text/v1 and link/v1 stored rows. Related: #22.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
posta/server#25
No description provided.