Skip to content

fix(read-path): reach complete threads, dense-second timelines, and all people in the GUI#1418

Open
tlongwell-block wants to merge 10 commits into
mainfrom
feat/read-path-lane1-threads
Open

fix(read-path): reach complete threads, dense-second timelines, and all people in the GUI#1418
tlongwell-block wants to merge 10 commits into
mainfrom
feat/read-path-lane1-threads

Conversation

@tlongwell-block

@tlongwell-block tlongwell-block commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

What & why

This PR fundamentally fixes how the desktop GUI reaches channel messages, thread replies, and people — removing the "hacky limits" that silently miss messages/people. This is the Tier-1 read-path work: reachability via keyset pagination over the already-merged relay substrate (PR #1321), no schema change, no hot-path rewrite. Two lanes, one PR.

Lane 1 — messages + threads (Eva)

Threads (descendant gap). The thread panel derived replies from the channel cache, and useLoadMissingAncestors only backfilled ancestors. Replies outside the cold-load window were never fetched, so deep/old threads rendered silently incomplete. useThreadReplies closes it via the same cache seam: page get_thread_replies to the floor over the gap-free (created_at, event_id) keyset and merge into the channel cache (dedupe by id, idempotent per root, retry on error). Downstream grouping/ordering/unread unchanged.

  • The backend get_thread_replies keyset-ordered on event_created_at alone with no tiebreak, so a page ending mid-second dropped every remaining tied reply. Composite (event_created_at, event_id) keyset closes it. The query keys on root_event_id (which root rows lack), so results are replies-only (depth >= 1) — root excluded by construction.

Channel timeline (dense-second wall). Ordinary same-second overlap is duplicate-safe (relay until is inclusive), so the timeline doesn't drop messages normally. The one genuinely unreachable case: a single created_at second with more messages than one WS page — until never advances, everything behind it is stuck. get_channel_messages_before gives the pager a composite-keyset fallback that:

  • activates only on a full-page same-second stall (the proven wall),
  • seeds from the max event id already known at the boundary second (advances via id > before_id under created_at DESC, id ASC),
  • drains the entire boundary second before honoring the per-pass budget,
  • leaves the WS live-tail/optimistic/cold-load hot path untouched.

Lane 2 — people search / mentions / members (Wren)

Pages user search so add-users / tag-users / mention interactions reach every match instead of a hacky first-page limit (user_search.rs, useMentions.ts, members/mention/DM UIs, profile/hooks.ts). Empty-query directory browse is now paged end-to-end (relay page→offset on the non-search /query path; desktop emits a next cursor on full raw pages). E2E coverage in channels.spec.ts + mentions.spec.ts.

The blocker a live relay surfaced (and its fix)

Static review passed the thread path, but a dynamic redteam against a real relay caught a production-fatal blocker the mock hid: get_thread_replies built a kindless /query filter, and the relay bridge p-gate (p_gated_filters_authorized) 403s any kindless filter before routing — a kindless filter "could match" a p-gated kind, so the gate demands a #p=self tag the thread read doesn't send. The relay routes the thread query purely off #e + depth_limit (kind doesn't filter it), but the gate runs first. The e2e mock never modeled p-gating, so CI stayed green on a 403 — every thread-backfill call would have failed in prod.

Fixes:

  • messages.rs — give the thread filter the non-p-gated TIMELINE_KINDS (via a shared const + build_thread_replies_filter helper mirroring the channel-messages sibling, so the two p-gate filters can't drift). Guard unit tests assert the filter carries kinds and that every kind is absent from P_GATED_KINDS (the real invariant), plus composite-cursor paging.
  • e2eBridge.ts — model the relay p-gate for /query and WS REQ so a kindless #e read is now rejected in the mock too (faithful port of p_gated_filters_authorized), closing the CI blind spot. Also fix the mock subtree walk to traverse nested replies transitively (frontier BFS for depth > 1).
  • useThreadReplies.ts — clear the fetched-root mark on cancellation unless the paging loop completed, so an interrupted first fetch retries on reopen instead of stranding the feature. Narrow effect deps to primitive channel id/type.
  • Corrected a false "includes root at depth 0" contract at all three doc sites (messages.rs / tauri.ts / types.ts): results are replies-only.

Verification

Live re-verify at the final tip (the gate that caught the blocker): built the relay at this SHA on a clean-room DB, seeded root + 250 dense same-second depth-1 replies + 1 depth-2 grandchild.

  • kindless #e filter → HTTP 403 (blocker reproduced exactly),
  • shipped kinded filter → HTTP 200, 251 rows, 250/250 depth-1 reached, depth-2 reached, root excluded.

Green on the integrated branch: tsc, biome, file-sizes, px-text, cargo check, unit suites (buzz-db 124/0 incl. Postgres with a red-check proving the stub test fails on the pre-fix SQL; desktop 864/0), the dense-second Playwright spec, and all pre-push CI hooks (desktop-test, rust-tests, mobile-test, desktop-tauri-test).

Notes for review

  • Not in scope (deferred): the provable-no-loss relay_seq work (Tier-2) — never implemented (spec-only), named as follow-on, not reopened here.
  • File-size overrides bumped (messages.rs, tauri.ts) for load-bearing bindings/fixes from both lanes. Splitting the central bindings is known debt / a separate PR — no opportunistic refactor here.
  • Pre-existing flakes: scroll-history.spec.ts "preserves user scroll" / "does not teleport" flake on local runners at the clean baseline too (wheel timing); pass under CI's retry policy. Not touched by this change.

Final score: 9/10.

npub12gtutshhh76rx0jx697f32f9tffd4hhp3hx58fp4x6u4uemkm7sqf8f757 and others added 10 commits June 30, 2026 20:59
Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Threads rendered silently incomplete because the desktop assembled them
from the channel cache instead of fetching the reply subtree. This wires
a first-class server-side thread read (`get_thread_replies` over
`thread_metadata`) through the bridge and a new Tauri command, and fixes
a correctness hole in its pagination.

The pagination bug: `get_thread_replies` keyset-ordered on
`event_created_at` alone, with `WHERE event_created_at > $cursor` and no
tiebreak. Thread replies routinely share a `created_at` second (bursty
threads), so once a page ended mid-second the cursor advanced past the
whole second and every remaining tied reply was silently dropped —
reproduced live on a seeded 40-reply thread spanning two seconds
(24 + 16): paging at limit 10 lost 14 of the 24 tied replies. That is
the exact "missed messages" class this work targets.

Fix (Tier-1, no schema change): composite `(event_created_at, event_id)`
keyset. The query orders and compares on the pair
(`WHERE (event_created_at, event_id) > ($ts, $id)
ORDER BY event_created_at ASC, event_id ASC`), and the cursor carries the
last reply's event id as the tiebreak end to end:
- buzz-db: composite cursor decode (8-byte BE seconds + raw event id) and
  row-comparison keyset; a bare 8-byte cursor still works for back-compat.
- buzz-relay bridge: `extract_thread_cursor` reads `thread_cursor` +
  `thread_cursor_id` and encodes the composite bytes.
- desktop: `ThreadCursor { created_at, event_id }` request/response type;
  the `get_thread_replies` command derives the next cursor from the last
  event's `(created_at, id)` transparently — no server-issued token.

Verified end to end against a live relay: paging the seeded thread at
limit 10 returns a set byte-identical to the full-thread oracle (40/40,
no gaps, no duplicates). New DB regression test pins the same-second-tie
case; +5 bridge unit tests cover the composite cursor encoding.

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
…the GUI

Second half of the Tier-1 read-path work (the backend thread fetch +
composite keyset landed in the prior commit). This wires the frontend to
those server-side reads and adds a channel-timeline escape hatch, closing
the two remaining "missed messages" gaps in the desktop GUI without a
schema change or a hot-path rewrite.

Threads (descendant gap): the thread panel derives its replies from the
channel cache, and `useLoadMissingAncestors` only backfills *ancestors*
(walking `e`-tags upward). Replies that fell outside the channel
cold-load window were therefore never fetched — deep or old threads
rendered silently incomplete. `useThreadReplies` closes this using the
same cache seam: on thread open it pages `get_thread_replies` to the
floor over the gap-free `(created_at, event_id)` keyset and merges each
event into the channel cache (`mergeMessages`, dedupe by id). Idempotent
per (channel, root); retries on error rather than caching a partial
subtree. All downstream grouping/ordering/unread derivation is unchanged
— the thread simply becomes complete.

Channel timeline (dense-second wall): scrollback pages history over WS
`REQ` with a bare `until` (`created_at`) cursor. Ordinary same-second
overlap is duplicate-safe (relay `until` is inclusive), so the timeline
does not drop messages in the normal case. The one genuinely unreachable
case is a single `created_at` second holding more messages than one WS
page: `until` keeps returning that second's newest slice, the boundary
never advances, and everything behind it is stuck. `get_channel_messages_before`
gives the pager a bridge composite-keyset fallback that advances within a
tied second via `id > before_id` under the relay's `created_at DESC,
id ASC` order. It activates *only* on a full-page same-second stall (the
proven wall), seeds from the max event id already known at the boundary
second (not `baseline[0].id`, which would re-request held rows), and
drains that entire boundary second before honoring the per-pass
row-floor/batch budget — yielding mid-second would strand the tail behind
the same stall with no guaranteed sentinel re-arm. Older history then
resumes on the normal budget. The WS live-tail/optimistic/cold-load hot
path is untouched.

Verification: new Playwright parity test seeds a 450-message dense second
(>2 WS pages) behind the cold-load window and proves the keyset fallback
fires and every row is reachable. It caught a real tail-stranding bug in
the first cut (drain honored the batch budget mid-second, stranding 50 of
450 permanently); the boundary-second-drain invariant is the fix. Green:
tsc, biome, file-sizes, px-text, cargo check, 1399 unit tests, and the
new spec. `tauri.ts`'s existing file-size override is bumped for the two
load-bearing bindings (split is known debt, a separate PR).

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Lane 2 (Wren): page user search so add-users / tag-users interactions
reach every match instead of a hacky first-page limit. Independent of
Lane 1's message/thread read-path (disjoint files); merged to land the
read-path work as one PR.

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>

* feat/read-path-people-model:
  test(desktop): cover people search pagination
  fix(desktop): page user search reachability
Keep the read-path branch mergeable so CI materializes the merge ref.
Relay-only change, disjoint from the desktop read-path work.

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>

* origin/main:
  fix(relay): enable Redis TLS for rediss:// (ElastiCache) (#1417)
…eads

The channel-timeline dense-second fallback sent its composite keyset
tiebreak under the filter key `n`, but the relay bridge's
`extract_before_id` reads it from `before_id` (crates/buzz-relay/src/api/
bridge.rs). So on the real `/query` path `query.before_id` was never set:
the fallback silently degraded to a bare inclusive `until`, re-returned
the same full dense-second page, re-emitted the same cursor, and
`drainOlderViaKeyset` could spin inside the boundary-second drain loop
instead of breaking the wall — the exact case the escape hatch exists to
fix. Caught by Wren in PR review.

The Playwright spec passed anyway because its mock-store path reimplements
the keyset in JS (never exercising the relay filter field), and config
mode sent the same wrong key. Fix both call sites to send `before_id`, and
add a Rust unit test that pins the field name so the client/relay contract
can't drift silently again — extracting `build_channel_messages_before_filter`
to make the emitted filter testable, mirroring `build_search_messages_filter`.

Verified: 863/863 desktop cargo tests (incl. the 2 new filter tests), tsc,
biome, file-sizes, px-text, 1399 node unit tests, dense-second Playwright.

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
The empty-query people directory listing was terminal: it could never page past its first page, so on busy relays not all people were reachable via the DM picker / add-user / tag flows (Tyler's "all people" goal).

Two halves, both fixed:

- Relay: the non-search general query path (POST /query, no `search` field) built an EventQuery but never read the raw `page` extension, so no SQL OFFSET was set. Only `search`-bearing filters route to handle_bridge_search (which has its own page/per_page). Empty kind:0 listings fell through here. New `extract_page_offset` wires page -> (page-1)*limit into query.offset; query_events already applies OFFSET with deterministic `created_at DESC, id ASC` ordering, so offset paging is stable. Only fires when `page` > 1 is explicitly present, so keyset/general queries are untouched.

- Desktop: search_users' empty-query branch called list_user_search_results, which always returned next_cursor: None, so the frontend could never request page 2. Now emits Some(page+1) when the relay returned a full page (raw events.len() >= max), mirroring the existing NIP-50 search branch. Raw length is the correct fullness signal since list_user_search_results dedupes/truncates.

Config-mode e2eBridge delegates empty-query paging to the real relay /query, so this fix makes that path correct too; the mock-store path already paged locally.

Added Rust unit tests pinning the page->offset contract (absent/page-1 -> None, page N -> (N-1)*limit, missing limit -> None) — the durable server-side guard the JS mock structurally can't verify.

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
…nings

Rust Lint CI runs clippy with -D warnings, so the pre-existing `.as_ref().map(|c| c.as_slice())` on the Lane 1 thread-cursor path (bridge.rs) was a red gate regardless of provenance. `as_deref()` is the exact equivalent (Option<Vec<u8>> -> Option<&[u8]>).

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
The headline Lane-1 thread-backfill fix (get_thread_replies / useThreadReplies)
was dead against a real relay: the desktop command built a KINDLESS /query
filter, and the relay bridge p-gate (p_gated_filters_authorized) rejects any
kindless filter with 403 before routing — a kindless filter 'could match' a
p-gated kind, so it demands a #p tag we don't send. The relay routes the thread
query purely off #e + depth_limit (kind does not filter it), but the gate still
runs first. The e2e mock didn't model p-gating, so CI stayed green on a 403.

Fixes:
- messages.rs: give get_thread_replies' filter the non-p-gated TIMELINE_KINDS
  (extracted a shared const + build_thread_replies_filter helper mirroring the
  channel sibling, so the two filters can't drift). Guard unit tests assert the
  filter carries kinds and every kind is NOT in P_GATED_KINDS (the real
  invariant), plus composite-cursor paging.
- e2eBridge.ts: model the relay p-gate for /query and WS REQ so a kindless #e
  read is now rejected in the mock too — closes the CI blind spot. Also fix the
  mock subtree walk to traverse nested replies transitively.
- useThreadReplies.ts: clear the fetched-root mark on cancellation unless the
  paging loop completed, so an interrupted first fetch retries on reopen instead
  of silently stranding the feature. Narrow effect deps to primitive channel
  id/type to avoid object-identity churn.
- Correct the false 'includes root at depth 0' contract at all three doc sites
  (messages.rs / tauri.ts / types.ts): the query keys on root_event_id, which
  root rows lack, so results are replies-only (depth >= 1).
- thread.rs: fix the nested root-stub INSERT in insert_thread_metadata (column
  list omitted community_id while binding it, scrambling every placeholder) to
  match the correct production sibling in event.rs. Add depth-2 reachability and
  a direct stub-branch guard (Postgres-gated).
- check-file-sizes.mjs: record load-bearing overrides for messages.rs (fix +
  guard tests crossed 1000 from 995) and tauri.ts (+3 doc-only lines).

Verified: tauri 864/0, buzz-db 124/0 (incl. Postgres, serialized, with a
red-check confirming the stub test fails on the pre-fix SQL), and a live
clean-room relay at this tip: kindless #e -> 403, shipped kinded filter -> 200
with 250/250 dense same-second depth-1 replies + a depth-2 nested reply reached,
root excluded.

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant