src: embed zstd dictionary for further compile cache size wins#16
Merged
anonrig merged 1 commit intoJun 12, 2026
Merged
Conversation
Builds on the zstd compression in nodejs#63861 by embedding a small zstd dictionary trained on a diverse corpus of real modules, so each small/medium compile-cache entry compresses better. Per entry we keep the smaller of the plain and dictionary-assisted frame, so the dictionary only ever helps. - Add src/compile_cache_zstd.dict (16 KiB). It is trained on V8 code caches harvested (via vm.compileFunction, the same shape the CJS loader produces) from a diverse corpus: bundled npm packages, lib/, tools/ and a few deps. - Add tools/generate_compile_cache_dict.py and a node.gyp action that generates compile_cache_zstd_dict.h into SHARED_INTERMEDIATE_DIR at build time; no generated header is checked in. libnode include_dirs updated to pick it up. - Prepare the CDict/DDict once per process (shared across all handlers and Workers, matching the lazy-context approach from nodejs#63861) and use them in Persist() and ReadCacheFile(). Persist() compresses the plain and dict frames into separate buffers and selects the smaller, so the written bytes and recorded size always agree. The dictionary is only tried for entries up to 256 KiB; larger blobs never benefit, so the second compression is skipped to avoid wasted work. Falls back to plain zstd if dictionary preparation fails. - The dictionary is embedded in the binary because the compile cache must be usable early, portably, and without extra filesystem state. - No on-disk format change: dict-assisted frames carry the dictID, plain frames carry none, and a single DDict decompresses both. - Size, measured on data held out from training (per-entry min policy): diverse modules go from ~1.87x (plain zstd) to ~2.44x with the dictionary (~24% smaller on disk); on test/parallel, which is not in the training corpus at all, ~1.74x -> ~2.22x (~22% smaller). A real end-to-end run (npm --version, ~70 modules) is ~15% smaller. Read time is unchanged and the extra write-time work is negligible. - Add a multi-module write/read roundtrip test and a startup benchmark (standard createBenchmark harness).
anonrig
approved these changes
Jun 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Experimental, for discussion
Gist: When you have lots of small files, using a pregenerated dictionary is a win compression wise, at the cost of a few KB of extra payload.
Summary
This builds on the zstd compression added in nodejs#63861 by embedding a small (16
KiB) zstd dictionary trained on a diverse corpus of real modules. For each
compile-cache entry we compress with and without the dictionary and keep
whichever frame is smaller, so the dictionary only ever helps.
The dictionary mainly benefits the common "many small/medium modules" case,
where individual code caches are too short for plain zstd to find much
redundancy on its own. Large single blobs are left untouched (plain zstd
already wins there, and the dictionary path is skipped for them entirely).
Size benefits
The compile cache is already compressed by nodejs#63861; the question this PR answers
is how much more the dictionary saves on top of that plain zstd. All
numbers below use the shipped policy (per-entry
min(plain, dict), level 1) andcompare against the no-dictionary baseline. The ratios are on-disk size vs. the
raw V8 code cache, so higher is smaller-on-disk.
test/parallel(4,119 files, not in training)npm --versionend-to-end (~70 real modules)Reading this:
never trained on (a held-out file split, and an entire corpus —
test/parallel— absent from training), so they reflect generalization, notmemorization.
persist path: the on-disk cache for npm's module graph shrinks from 138 KB to
117 KB.
dictionary takes it to ~2.2–2.4×, recovering roughly another 15–24% of the
on-disk footprint, concentrated in the small/medium modules that dominate real
workloads.
typescript.jsfixture, > 256 KiB) stay on theplain path and are byte-for-byte unchanged.
The cost is +16 KiB in the
nodebinary (the embedded dictionary). A 32 KiBdictionary was measured to add only ~1 percentage point and 48 KiB nothing
beyond that, so 16 KiB is the size/benefit knee.
Timing (does the dictionary make things slower?)
A/B against the no-dictionary baseline (this commit's parent), same tree, only
compile_cache.cc/.hdiffer. Trimmed median wall time per process (AppleSilicon, both binaries measured back-to-back). The
nocacherow uses no compilecache at all, so its delta is the run-to-run noise floor — read the other rows
against it.
Big single blob —
typescript.js(~1.8 MB cache, 1 entry; above the 256 KiBthreshold, so the dictionary is skipped on write):
Many small modules (120 entries; all below the threshold, dictionary applied):
Takeaways: the read path — paid on every warm-cache startup — is within
noise;
decompress_usingDDictis not measurably slower than plain decompress,and the one-time per-process
DDictdigest of a 16 KiB dictionary isnegligible. Write overhead (only at persist, on shutdown) is sub-millisecond for
many modules and zero for the big blob (the size gate skips it). On-disk size
never regresses.
Why embed the dictionary
The compile cache must be usable early during startup, portably, and without
relying on any additional filesystem state, so the dictionary is compiled into
the binary rather than loaded at runtime. Only the small binary
.dictischecked in; the C array is generated at build time.
How the dictionary is trained (reproducible)
The dictionary is trained on V8 code caches harvested via
vm.compileFunction(the same shape the CJS loader produces) from a diverse in-tree corpus: bundled
npm packages (
deps/npm/node_modules),lib/,tools/, and a fewdeps/libraries — ~1,200 modules. Those samples are fed to
zstd --train --maxdict=16384. The measurement corpora above are disjoint from this trainingset. The harvest+train script can be committed under
tools/so the.dictisregenerable from the tree rather than an opaque drop-in.
On-disk compatibility
No format change. The dictionary is a trained zstd dictionary, so dict-assisted
frames carry its dictID and plain frames carry none; the reader decompresses
both correctly with a single
DDict. Compile-cache directories are alreadykeyed by Node version, arch, and a cache-data version tag, so a future change to
the embedded dictionary is naturally isolated to a fresh cache directory.