FTS extraction for posta.link/v1 (alt + name) #27
Labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
posta/server#27
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Why
Migration 7 (#21) extracts `$.body` from posta.text/v1 payloads before
indexing, so text messages don't pollute search with wrapper tokens
(`posta`, `kind`, `v1`, `body`). The same pollution applies to
posta.link/v1 (#22): every image message now contributes `posta`,
`link`, `v1`, `kind`, `url`, `mediaType`, `name`, `alt`, `size`
plus the URL hostname/path to the FTS index. So a search for `posta`
still matches every image-attached message.
Scope
Extend the COALESCE chain in the three messages_fts triggers:
```sql
COALESCE(
json_extract(new.payload, '$.body'), -- posta.text/v1
nullif(trim(coalesce(json_extract(new.payload, '$.alt'), '')
|| ' '
|| coalesce(json_extract(new.payload, '$.name'), '')), ''), -- posta.link/v1
new.payload -- legacy + unknown
)
```
`alt` (accessibility text) and `name` (filename hint) are the only
human-readable fields on link/v1. Indexing both gives users
"search for the photo of the dog" if they put it in alt text.
Migration
Same pattern as migration 7: DROP + CREATE triggers, then
`INSERT INTO messages_fts(messages_fts) VALUES('delete-all')` followed
by a bulk re-index from `messages`.
Acceptance
search for `sunset`.
Related: #21 (text extraction precedent), #22 (link/v1 wire kind).