Prune push-diag log lines #17

Open
opened 2026-05-16 22:22:47 +02:00 by arne · 1 comment
Owner

What to build

The slog.Info("push-diag.…") calls scattered across internal/sse/hub.go, internal/web/events.go, and internal/web/push.go were added during a debugging session for the push-keepalive-restart-gap incident and never cleaned up. For each call site, decide: promote to a permanent counter / structured-log channel (if the signal is durable observability), or delete (if it was a one-time investigation aid).

Known sites (non-exhaustive):

  • push-diag.advance_watermark in hub.go
  • push-diag.on_inbound in push.go
  • push-diag.events.subscribe, push-diag.active_peer.post in events.go (may already be deleted by the active-peer migration issue)

Acceptance criteria

  • grep -r push-diag internal/ returns no matches
  • Each removed line either has a justification in the PR description ("one-off, no longer needed") or has been replaced with a documented permanent log call
  • No regression in push-related Go tests

Blocked by

None - can start immediately

## What to build The `slog.Info("push-diag.…")` calls scattered across `internal/sse/hub.go`, `internal/web/events.go`, and `internal/web/push.go` were added during a debugging session for the push-keepalive-restart-gap incident and never cleaned up. For each call site, decide: promote to a permanent counter / structured-log channel (if the signal is durable observability), or delete (if it was a one-time investigation aid). Known sites (non-exhaustive): - `push-diag.advance_watermark` in `hub.go` - `push-diag.on_inbound` in `push.go` - `push-diag.events.subscribe`, `push-diag.active_peer.post` in `events.go` (may already be deleted by the active-peer migration issue) ## Acceptance criteria - [ ] `grep -r push-diag internal/` returns no matches - [ ] Each removed line either has a justification in the PR description ("one-off, no longer needed") or has been replaced with a documented permanent log call - [ ] No regression in push-related Go tests ## Blocked by None - can start immediately
Author
Owner

This was generated by AI during triage.

Agent Brief

Category: enhancement
Summary: Remove the push-diag.* slog.Info lines added during a past debugging session. Default is delete; promote to a documented counter or Debug-level log only when the signal would be useful for operating the system long-term.

Current behavior:
Four slog.Info("push-diag.…") call sites exist across the live-update and push code paths. They were added during the push-keepalive-restart-gap incident and never cleaned up. Two of them are noisy: they fire on every inbound message at Info level.

Desired behavior:
grep -r 'push-diag\.' internal/ returns no matches. For each removed line, the PR description either notes "one-off diagnostic, no longer needed" or — for the small subset worth keeping — describes the durable signal and the replacement (a counter or a Debug-level log call with a non-push-diag name).

Per-site notes:

Two of the four call sites are inside code paths that the active-peer migration deletes (push-diag.events.subscribe in the /events handler, push-diag.active_peer.post in the active-peer POST handler). If that migration lands first they vanish naturally; if this issue lands first the agent removes them outright. Either ordering is fine.

The remaining two — push-diag.on_inbound in the push dispatcher and push-diag.advance_watermark in the hub — are the real decision points. Both fire on every inbound message. At Info level they pollute production logs.

Decision guidance for the agent:

Delete by default. A promotion is justified only if all three are true:

  1. Operating the system in production would be measurably harder without the signal.
  2. The signal is bounded — not "every inbound event" — or aggregated as a counter rather than emitted per-event.
  3. The replacement name does not carry the push-diag prefix.

If a promotion is justified, prefer a counter (e.g. expvar-style or slog at Debug) over an Info log. Per-event Info logs on a hot path are not durable observability.

Acceptance criteria:

  • grep -r 'push-diag\.' internal/ returns no matches
  • Every removed line is accounted for in the PR description: deleted with no replacement, or replaced with a documented non-push-diag signal
  • No new per-event slog.Info calls on the inbound hot path
  • Existing tests still pass

Out of scope:

  • Any other log-cleanup or observability work (only push-diag.* lines)
  • Changes to push-subscription lifecycle or the keepalive mechanism
  • Changes to the live-updates protocol
> *This was generated by AI during triage.* ## Agent Brief **Category:** enhancement **Summary:** Remove the `push-diag.*` `slog.Info` lines added during a past debugging session. Default is delete; promote to a documented counter or `Debug`-level log only when the signal would be useful for operating the system long-term. **Current behavior:** Four `slog.Info("push-diag.…")` call sites exist across the live-update and push code paths. They were added during the push-keepalive-restart-gap incident and never cleaned up. Two of them are noisy: they fire on every inbound message at `Info` level. **Desired behavior:** `grep -r 'push-diag\.' internal/` returns no matches. For each removed line, the PR description either notes "one-off diagnostic, no longer needed" or — for the small subset worth keeping — describes the durable signal and the replacement (a counter or a `Debug`-level log call with a non-`push-diag` name). **Per-site notes:** Two of the four call sites are inside code paths that the active-peer migration deletes (`push-diag.events.subscribe` in the /events handler, `push-diag.active_peer.post` in the active-peer POST handler). If that migration lands first they vanish naturally; if this issue lands first the agent removes them outright. Either ordering is fine. The remaining two — `push-diag.on_inbound` in the push dispatcher and `push-diag.advance_watermark` in the hub — are the real decision points. Both fire on every inbound message. At `Info` level they pollute production logs. **Decision guidance for the agent:** Delete by default. A promotion is justified only if all three are true: 1. Operating the system in production would be measurably harder without the signal. 2. The signal is bounded — not "every inbound event" — or aggregated as a counter rather than emitted per-event. 3. The replacement name does not carry the `push-diag` prefix. If a promotion is justified, prefer a counter (e.g. `expvar`-style or `slog` at `Debug`) over an `Info` log. Per-event `Info` logs on a hot path are not durable observability. **Acceptance criteria:** - [ ] `grep -r 'push-diag\.' internal/` returns no matches - [ ] Every removed line is accounted for in the PR description: deleted with no replacement, or replaced with a documented non-`push-diag` signal - [ ] No new per-event `slog.Info` calls on the inbound hot path - [ ] Existing tests still pass **Out of scope:** - Any other log-cleanup or observability work (only `push-diag.*` lines) - Changes to push-subscription lifecycle or the keepalive mechanism - Changes to the live-updates protocol
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
posta/chat#17
No description provided.