Skip to content

src: make compile cache zstd dictionary reproducible#17

Open
lemire wants to merge 1 commit into
anonrig:compile-cache-perffrom
lemire:compile-cache-dict-repro
Open

src: make compile cache zstd dictionary reproducible#17
lemire wants to merge 1 commit into
anonrig:compile-cache-perffrom
lemire:compile-cache-dict-repro

Conversation

@lemire

@lemire lemire commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

self-explanatory

Add tools/train_compile_cache_dict.mjs, a maintainer tool that
regenerates src/compile_cache_zstd.dict end to end, and replace the
embedded dictionary with its byte-for-byte reproducible output.

The script walks a fixed in-tree corpus, harvests a V8 code cache from
each module via vm.compileFunction with produceCachedData (the same
shape the CommonJS loader produces at runtime), feeds the blobs to
`zstd --train`, and writes the 16 KiB dictionary in place. The output
is byte-for-byte stable when node is run with --predictable (the script
re-execs itself with it, since V8 otherwise randomizes the string hash
seed and that seed leaks into cachedData), the corpus and its sorted
order are fixed, and the node build is fixed.

This documents and makes reproducible exactly how the embedded
dictionary was created, so a future maintainer can regenerate it (e.g.
after a V8 or corpus change) and review the resulting diff.

Refs: nodejs#63861
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant