Skip to content

LittDB: Keymap threading improvements#3645

Open
cody-littley wants to merge 4 commits into
mainfrom
cjl/litt-fast-flush
Open

LittDB: Keymap threading improvements#3645
cody-littley wants to merge 4 commits into
mainfrom
cjl/litt-fast-flush

Conversation

@cody-littley

Copy link
Copy Markdown
Contributor

Describe your changes and provide context

Several changes to the threading of the keymap, with potentially large impacts for some workloads.

  • Writing to the keymap is now asyncronous and off the main control loop thread
    • this makes it possible for a crash to leave data in a segment but not in the keymap, we now detect and fix this at startup time
    • flush should have lower latency, especially for workloads with many keys being flushed
    • flushing LittDB should now be roughly equivalent to flushing a WAL, since that's basically all a littDB segment is
  • keymap GC now happens off the main control loop thread

@cursor

cursor Bot commented Jun 25, 2026

Copy link
Copy Markdown

PR Summary

High Risk
Changes core persistence, GC ordering, and crash/recovery paths for the keymap and segments; incorrect watermark or repair logic could cause data loss or unreadable keys.

Overview
Moves keymap puts and GC deletes off the disk-table control loop onto a dedicated keymapManager goroutine. Flushes/seals still make segment data crash-durable first, then schedule keymap writes asynchronously (with batching, backpressure, and a deletion watermark channel). Clean shutdown drains the manager so the keymap matches segments without needing repair.

Garbage collection is now two-phase: phase 1 schedules segment key deletions on the manager; phase 2 removes segment files only after the watermark shows those keymap entries are durably deleted. RunGC runs a full pass, syncs the manager, then a files-only pass. Default GC period drops from 5 minutes to 10 seconds.

Adds repairKeymap on open (when not doing a full reload) to rescue a contiguous missing tail of keys from segment key files in one atomic Put. Documents that keymap batches must be crash-atomic for repair correctness.

Removes CacheAwareGet from the public table API and simplifies the cached-table wrapper to Get-only metrics; deletes related integration tests.

New config knobs: keymap manager channel size, max batch size, flush interval, max buffered deletes, watermark channel size.

Reviewed by Cursor Bugbot for commit 1108988. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJun 26, 2026, 8:43 PM

// from under an open iterator would corrupt its snapshot, so we skip GC entirely until every
// iterator has been closed.
// GC is suspended while one or more iterators are open. Deleting a segment (or its keymap entries) out
// from under an open iterator would corrupt its snapshot, so we skip GC until every iterator is closed.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async keymap deletes during iterators

Medium Severity

With an open iterator, doGarbageCollection returns immediately so the control loop no longer schedules new keymap deletes, but the keymap manager still applies deletes already queued. That can remove keymap entries while segment files remain (phase 2 is also skipped), so Get/Exists can miss keys that iterators still read from segments—unlike the prior synchronous GC path, which skipped keymap deletion entirely while iterators were open.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7b4c308. Configure here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a real bug. Deleting keymap entries before segment entries is intentional. Once a key disappears from the keymap, that value is effectively "not present" in the DB. It doesn't violate contracts if an iterator can still see a key for a while after it is not visible to standard getter methods.

if err != nil {
c.logger.Error("failed to drain keymap writer", "error", err)
return
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close hangs after drain failure

High Severity

If keymapManager.drain() fails during shutdown, handleShutdownRequest returns after logging without sending on req.shutdownCompleteChan. Close() still waits on that channel, so shutdown can block forever even though the table never finished stopping.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7b4c308. Configure here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another false positive. Close() doesn't wait on that channel alone, it waits via util.Await, which selects on the channel or the error monitor's cancellation.

@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 73.12500% with 86 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.17%. Comparing base (c528303) to head (1108988).
⚠️ Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
sei-db/db_engine/litt/disktable/keymap_manager.go 72.95% 28 Missing and 15 partials ⚠️
sei-db/db_engine/litt/disktable/disk_table.go 76.81% 7 Missing and 9 partials ⚠️
sei-db/db_engine/litt/disktable/control_loop.go 75.47% 9 Missing and 4 partials ⚠️
sei-db/db_engine/litt/littdb_config.go 64.28% 5 Missing and 5 partials ⚠️
sei-db/db_engine/litt/dbcache/cached_table.go 55.55% 3 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3645      +/-   ##
==========================================
- Coverage   59.12%   58.17%   -0.95%     
==========================================
  Files        2259     2177      -82     
  Lines      186489   177099    -9390     
==========================================
- Hits       110255   103028    -7227     
+ Misses      66353    64956    -1397     
+ Partials     9881     9115     -766     
Flag Coverage Δ
sei-chain-pr 68.77% <73.12%> (?)
sei-db 70.41% <ø> (ø)
sei-db-state-db ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-db/db_engine/litt/disktable/flush_loop.go 47.05% <100.00%> (+1.77%) ⬆️
sei-db/db_engine/litt/table.go 100.00% <ø> (ø)
sei-db/db_engine/litt/dbcache/cached_table.go 57.30% <55.55%> (-5.47%) ⬇️
sei-db/db_engine/litt/littdb_config.go 51.51% <64.28%> (-5.35%) ⬇️
sei-db/db_engine/litt/disktable/control_loop.go 71.33% <75.47%> (+1.98%) ⬆️
sei-db/db_engine/litt/disktable/disk_table.go 66.66% <76.81%> (+2.15%) ⬆️
sei-db/db_engine/litt/disktable/keymap_manager.go 72.95% <72.95%> (ø)

... and 83 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

timer := time.NewTimer(m.maxFlushInterval)
defer timer.Stop()

for len(*puts)+len(*deletes) < m.maxBatchSize {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop checks the batch size at the top, then appends an entire request's key slice. A single scheduleDelete for a segment with 100,000 keys would blow past the 1024 maxBatchSize in one append. This doesn't affect correctness — deleteBatch chunks deletes into deleteBatchSize (10,000) sub-batches — but the put side has no chunking, so a single writeBatch with 100,000+ keys hits the keymap in one giant Put call, which could spike latency.

Consider Cap writeBatch size the same way deleteBatch is capped, deleteBatch chunks into deleteBatchSize sub-batches, but writeBatch doesn't. For consistency and to bound peak latency.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change made (required a refactor)

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 1108988. Configure here.


if err := d.keymap.Put(rescued); err != nil {
return fmt.Errorf("failed to put rescued keys into keymap: %w", err)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repair undoes GC keymap deletes

High Severity

After a crash between two-phase GC phase 1 (durable keymap deletes) and phase 2 (segment file removal), startup repairKeymap treats every key still on disk but missing from the keymap as an async-write orphan and Puts them back. That restores TTL-expired data the collector already removed from the keymap, because deletion watermark state is in-memory only and is reset on reopen.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 1108988. Configure here.

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A well-engineered, thoroughly-tested rework of LittDB keymap threading (async keymap manager, two-phase GC gated by a deletion watermark, crash-time keymap repair). No blocking correctness issues found; a couple of low-severity edge-case observations remain.

Findings: 0 blocking | 4 non-blocking | 2 posted inline

Blockers

  • None at the file/PR level.

Non-blocking

  • Cursor second-opinion pass produced no output (cursor-review.md is empty), so only the Codex pass and this review contributed external findings.
  • Startup now runs repairKeymap on every non-reload open. On a clean shutdown this is cheap (the first key checked is present, so it stops immediately), but it's worth confirming via the existing TestCleanShutdownLeavesKeymapConsistent that the zero-rescue fast path holds for large tables in CI, since the walk does a keymap.Get per missing key.
  • 2 suggestion(s)/nit(s) flagged inline on specific lines.

// could deadlock the seal path). If the channel is full the update is dropped: the watermark is monotonic and
// the control loop drains it each GC pass, so a later publish delivers an equal-or-newer value. A dropped
// update only delays file deletion, which is always safe.
func (m *keymapManager) publishDeletionWatermark(watermark int64) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[suggestion] Codex flagged that a dropped watermark publish can strand GC'd segment files. The drop here is fire-and-forget (correctly, to avoid the seal-path deadlock), and the watermark is monotonic, but the control loop advances deletionScheduledIndex when a delete is scheduled and never reschedules those segments. So if the channel is full when the highest watermark is published and no later segment ever expires (e.g. writes stop on an idle table), keymapDeletionWatermark stays below the true highest and phase-2 leaves those segments' files on disk — they remain counted/readable, then get reclaimed on the next delete that advances the watermark, or on restart.

This is a disk-reclamation delay in an extreme edge case (the channel defaults to 1024 and is drained every GC pass), not data loss or resurrection of deleted keys, so I'd downgrade Codex's "permanently strand" framing to a low-severity note. Still, given the only cost of a fuller-fidelity signal is a small struct, consider publishing the watermark via a single-slot "latest value" (overwrite-on-full) rather than a bounded queue that can drop the newest value, which would close the gap entirely.

// caller without waiting for the keymap write. The segment data is already crash durable; a crash that
// loses the scheduled write is repaired from the segment key files on the next startup. A full manager
// channel backpressures here, which propagates back to Flush().
err = f.keymapManager.scheduleWrite(durableKeys)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Behavioral change worth a note (per Codex): Flush() now returns success once the keymap put is scheduled, so a subsequent keymap.Put failure (e.g. a duplicate-key error under DoubleWriteProtection) surfaces asynchronously as an error-monitor panic on the manager goroutine rather than to the Flush() caller that produced it. The DB still enters the panicked state and later calls observe the error, so this isn't silent data loss — just a shift in which call observes the failure. Acceptable given the async design; flagging so the changed error-surfacing contract is intentional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants