LittDB: Keymap threading improvements#3645
Conversation
PR SummaryHigh Risk Overview Garbage collection is now two-phase: phase 1 schedules segment key deletions on the manager; phase 2 removes segment files only after the watermark shows those keymap entries are durably deleted. Adds Removes New config knobs: keymap manager channel size, max batch size, flush interval, max buffered deletes, watermark channel size. Reviewed by Cursor Bugbot for commit 1108988. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
| // from under an open iterator would corrupt its snapshot, so we skip GC entirely until every | ||
| // iterator has been closed. | ||
| // GC is suspended while one or more iterators are open. Deleting a segment (or its keymap entries) out | ||
| // from under an open iterator would corrupt its snapshot, so we skip GC until every iterator is closed. |
There was a problem hiding this comment.
Async keymap deletes during iterators
Medium Severity
With an open iterator, doGarbageCollection returns immediately so the control loop no longer schedules new keymap deletes, but the keymap manager still applies deletes already queued. That can remove keymap entries while segment files remain (phase 2 is also skipped), so Get/Exists can miss keys that iterators still read from segments—unlike the prior synchronous GC path, which skipped keymap deletion entirely while iterators were open.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 7b4c308. Configure here.
There was a problem hiding this comment.
Not a real bug. Deleting keymap entries before segment entries is intentional. Once a key disappears from the keymap, that value is effectively "not present" in the DB. It doesn't violate contracts if an iterator can still see a key for a while after it is not visible to standard getter methods.
| if err != nil { | ||
| c.logger.Error("failed to drain keymap writer", "error", err) | ||
| return | ||
| } |
There was a problem hiding this comment.
Close hangs after drain failure
High Severity
If keymapManager.drain() fails during shutdown, handleShutdownRequest returns after logging without sending on req.shutdownCompleteChan. Close() still waits on that channel, so shutdown can block forever even though the table never finished stopping.
Reviewed by Cursor Bugbot for commit 7b4c308. Configure here.
There was a problem hiding this comment.
Another false positive. Close() doesn't wait on that channel alone, it waits via util.Await, which selects on the channel or the error monitor's cancellation.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3645 +/- ##
==========================================
- Coverage 59.12% 58.17% -0.95%
==========================================
Files 2259 2177 -82
Lines 186489 177099 -9390
==========================================
- Hits 110255 103028 -7227
+ Misses 66353 64956 -1397
+ Partials 9881 9115 -766
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| timer := time.NewTimer(m.maxFlushInterval) | ||
| defer timer.Stop() | ||
|
|
||
| for len(*puts)+len(*deletes) < m.maxBatchSize { |
There was a problem hiding this comment.
The loop checks the batch size at the top, then appends an entire request's key slice. A single scheduleDelete for a segment with 100,000 keys would blow past the 1024 maxBatchSize in one append. This doesn't affect correctness — deleteBatch chunks deletes into deleteBatchSize (10,000) sub-batches — but the put side has no chunking, so a single writeBatch with 100,000+ keys hits the keymap in one giant Put call, which could spike latency.
Consider Cap writeBatch size the same way deleteBatch is capped, deleteBatch chunks into deleteBatchSize sub-batches, but writeBatch doesn't. For consistency and to bound peak latency.
There was a problem hiding this comment.
Change made (required a refactor)
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
There are 3 total unresolved issues (including 2 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 1108988. Configure here.
|
|
||
| if err := d.keymap.Put(rescued); err != nil { | ||
| return fmt.Errorf("failed to put rescued keys into keymap: %w", err) | ||
| } |
There was a problem hiding this comment.
Repair undoes GC keymap deletes
High Severity
After a crash between two-phase GC phase 1 (durable keymap deletes) and phase 2 (segment file removal), startup repairKeymap treats every key still on disk but missing from the keymap as an async-write orphan and Puts them back. That restores TTL-expired data the collector already removed from the keymap, because deletion watermark state is in-memory only and is reset on reopen.
Reviewed by Cursor Bugbot for commit 1108988. Configure here.
There was a problem hiding this comment.
A well-engineered, thoroughly-tested rework of LittDB keymap threading (async keymap manager, two-phase GC gated by a deletion watermark, crash-time keymap repair). No blocking correctness issues found; a couple of low-severity edge-case observations remain.
Findings: 0 blocking | 4 non-blocking | 2 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- Cursor second-opinion pass produced no output (cursor-review.md is empty), so only the Codex pass and this review contributed external findings.
- Startup now runs repairKeymap on every non-reload open. On a clean shutdown this is cheap (the first key checked is present, so it stops immediately), but it's worth confirming via the existing TestCleanShutdownLeavesKeymapConsistent that the zero-rescue fast path holds for large tables in CI, since the walk does a keymap.Get per missing key.
- 2 suggestion(s)/nit(s) flagged inline on specific lines.
| // could deadlock the seal path). If the channel is full the update is dropped: the watermark is monotonic and | ||
| // the control loop drains it each GC pass, so a later publish delivers an equal-or-newer value. A dropped | ||
| // update only delays file deletion, which is always safe. | ||
| func (m *keymapManager) publishDeletionWatermark(watermark int64) { |
There was a problem hiding this comment.
[suggestion] Codex flagged that a dropped watermark publish can strand GC'd segment files. The drop here is fire-and-forget (correctly, to avoid the seal-path deadlock), and the watermark is monotonic, but the control loop advances deletionScheduledIndex when a delete is scheduled and never reschedules those segments. So if the channel is full when the highest watermark is published and no later segment ever expires (e.g. writes stop on an idle table), keymapDeletionWatermark stays below the true highest and phase-2 leaves those segments' files on disk — they remain counted/readable, then get reclaimed on the next delete that advances the watermark, or on restart.
This is a disk-reclamation delay in an extreme edge case (the channel defaults to 1024 and is drained every GC pass), not data loss or resurrection of deleted keys, so I'd downgrade Codex's "permanently strand" framing to a low-severity note. Still, given the only cost of a fuller-fidelity signal is a small struct, consider publishing the watermark via a single-slot "latest value" (overwrite-on-full) rather than a bounded queue that can drop the newest value, which would close the gap entirely.
| // caller without waiting for the keymap write. The segment data is already crash durable; a crash that | ||
| // loses the scheduled write is repaired from the segment key files on the next startup. A full manager | ||
| // channel backpressures here, which propagates back to Flush(). | ||
| err = f.keymapManager.scheduleWrite(durableKeys) |
There was a problem hiding this comment.
[nit] Behavioral change worth a note (per Codex): Flush() now returns success once the keymap put is scheduled, so a subsequent keymap.Put failure (e.g. a duplicate-key error under DoubleWriteProtection) surfaces asynchronously as an error-monitor panic on the manager goroutine rather than to the Flush() caller that produced it. The DB still enters the panicked state and later calls observe the error, so this isn't silent data loss — just a shift in which call observes the failure. Acceptable given the async design; flagging so the changed error-surfacing contract is intentional.


Describe your changes and provide context
Several changes to the threading of the keymap, with potentially large impacts for some workloads.