What happened?
DictStrategy decides whether to apply a dictionary layout to a column by probe-compressing the column's first chunk and checking whether the cascade chose a dictionary encoding (compressed.is::<Dict>()). That probe is hardcoded to a stock BtrBlocksCompressor::default():
// vortex-layout/src/layouts/dict/writer.rs
let compressed = BtrBlocksCompressor::default().compress(&chunk, &mut exec_ctx)?;
!compressed.is::<Dict>()
So the dict-fit decision ignores the compressor the caller configured via WriteStrategyBuilder::with_btrblocks_builder / with_compressor. A caller who customizes the cascade — e.g. excluding a dictionary scheme — still gets a layout decision made by a compressor they didn't ask for.
This is observable in-tree: vortex-cuda/gpu-scan-cli writes with BtrBlocksCompressorBuilder::default().only_cuda_compatible(), which deliberately excludes StringDictScheme / BinaryDictScheme / FSSTScheme. Because the probe ignores that, a vortex.dict layout still fires for low-cardinality string columns; the GPU scan then skips Dict fields (gpu-scan-cli/src/main.rs, if field.is::<Dict>() { continue; }), so those columns silently drop off the pure-GPU path the config was built to keep them on.
Steps to reproduce
- Build a low-cardinality string column (e.g. 32,768 rows cycling
["alpha","beta","gamma"]).
- Write it twice through
SESSION.write_options().with_strategy(...):
- A:
WriteStrategyBuilder::default().build()
- B:
WriteStrategyBuilder::default().with_btrblocks_builder(BtrBlocksCompressorBuilder::default().exclude_schemes([StringDictScheme.id()])).build()
- Walk each file's layout tree (
footer().layout()) for a node whose encoding_id() == "vortex.dict".
- Expected: A has a dict layout, B falls back (no dict layout — StringDict was excluded). Actual: both A and B contain a
vortex.dict layout, because the probe uses the hardcoded default regardless of B's configuration.
Environment
- Vortex version:
develop @ 9444d20ae (source-level logic bug; not release-specific)
- Python/Java version: n/a
- OS: n/a (platform-independent)
Additional context
I have a pull request/branch with this fix on it, will post shortly.
Discovery was encountered myself while experimenting. Reproduced, tracked and found the fix using coding agents.
What happened?
DictStrategydecides whether to apply a dictionary layout to a column by probe-compressing the column's first chunk and checking whether the cascade chose a dictionary encoding (compressed.is::<Dict>()). That probe is hardcoded to a stockBtrBlocksCompressor::default():So the dict-fit decision ignores the compressor the caller configured via
WriteStrategyBuilder::with_btrblocks_builder/with_compressor. A caller who customizes the cascade — e.g. excluding a dictionary scheme — still gets a layout decision made by a compressor they didn't ask for.This is observable in-tree:
vortex-cuda/gpu-scan-cliwrites withBtrBlocksCompressorBuilder::default().only_cuda_compatible(), which deliberately excludesStringDictScheme/BinaryDictScheme/FSSTScheme. Because the probe ignores that, avortex.dictlayout still fires for low-cardinality string columns; the GPU scan then skips Dict fields (gpu-scan-cli/src/main.rs,if field.is::<Dict>() { continue; }), so those columns silently drop off the pure-GPU path the config was built to keep them on.Steps to reproduce
["alpha","beta","gamma"]).SESSION.write_options().with_strategy(...):WriteStrategyBuilder::default().build()WriteStrategyBuilder::default().with_btrblocks_builder(BtrBlocksCompressorBuilder::default().exclude_schemes([StringDictScheme.id()])).build()footer().layout()) for a node whoseencoding_id() == "vortex.dict".vortex.dictlayout, because the probe uses the hardcoded default regardless of B's configuration.Environment
develop@9444d20ae(source-level logic bug; not release-specific)Additional context
I have a pull request/branch with this fix on it, will post shortly.
Discovery was encountered myself while experimenting. Reproduced, tracked and found the fix using coding agents.