FTS extraction for posta.link/v1 (alt + name) #27

Open
opened 2026-05-13 21:39:00 +02:00 by arne · 0 comments
Owner

Why

Migration 7 (#21) extracts `$.body` from posta.text/v1 payloads before
indexing, so text messages don't pollute search with wrapper tokens
(`posta`, `kind`, `v1`, `body`). The same pollution applies to
posta.link/v1 (#22): every image message now contributes `posta`,
`link`, `v1`, `kind`, `url`, `mediaType`, `name`, `alt`, `size`
plus the URL hostname/path to the FTS index. So a search for `posta`
still matches every image-attached message.

Scope

Extend the COALESCE chain in the three messages_fts triggers:

```sql
COALESCE(
json_extract(new.payload, '$.body'), -- posta.text/v1
nullif(trim(coalesce(json_extract(new.payload, '$.alt'), '')
|| ' '
|| coalesce(json_extract(new.payload, '$.name'), '')), ''), -- posta.link/v1
new.payload -- legacy + unknown
)
```

`alt` (accessibility text) and `name` (filename hint) are the only
human-readable fields on link/v1. Indexing both gives users
"search for the photo of the dog" if they put it in alt text.

Migration

Same pattern as migration 7: DROP + CREATE triggers, then
`INSERT INTO messages_fts(messages_fts) VALUES('delete-all')` followed
by a bulk re-index from `messages`.

Acceptance

  • A posta.link/v1 row with `alt: "sunset over the fjord"` matches a
    search for `sunset`.
  • The same row does NOT match a search for `posta`, `link`, `mediaType`.
  • Migration test covers the upgrade path against existing FTS rows.

Related: #21 (text extraction precedent), #22 (link/v1 wire kind).

### Why Migration 7 (#21) extracts \`\$.body\` from posta.text/v1 payloads before indexing, so text messages don't pollute search with wrapper tokens (\`posta\`, \`kind\`, \`v1\`, \`body\`). The same pollution applies to posta.link/v1 (#22): every image message now contributes \`posta\`, \`link\`, \`v1\`, \`kind\`, \`url\`, \`mediaType\`, \`name\`, \`alt\`, \`size\` plus the URL hostname/path to the FTS index. So a search for \`posta\` still matches every image-attached message. ### Scope Extend the COALESCE chain in the three messages_fts triggers: \`\`\`sql COALESCE( json_extract(new.payload, '\$.body'), -- posta.text/v1 nullif(trim(coalesce(json_extract(new.payload, '\$.alt'), '') || ' ' || coalesce(json_extract(new.payload, '\$.name'), '')), ''), -- posta.link/v1 new.payload -- legacy + unknown ) \`\`\` \`alt\` (accessibility text) and \`name\` (filename hint) are the only human-readable fields on link/v1. Indexing both gives users \"search for the photo of the dog\" if they put it in alt text. ### Migration Same pattern as migration 7: DROP + CREATE triggers, then \`INSERT INTO messages_fts(messages_fts) VALUES('delete-all')\` followed by a bulk re-index from \`messages\`. ### Acceptance - A posta.link/v1 row with \`alt: \"sunset over the fjord\"\` matches a search for \`sunset\`. - The same row does NOT match a search for \`posta\`, \`link\`, \`mediaType\`. - Migration test covers the upgrade path against existing FTS rows. Related: #21 (text extraction precedent), #22 (link/v1 wire kind).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
posta/server#27
No description provided.