feat(rust): add serialize/deserialize to brute force index#2231
feat(rust): add serialize/deserialize to brute force index#2231jamie8johnson wants to merge 3 commits into
Conversation
Wrap the existing C entry points cuvsBruteForceSerialize and cuvsBruteForceDeserialize in safe Rust methods on brute_force::Index, bringing the brute force binding to parity with the CAGRA binding. - Index::serialize(&self, res, filename) writes the index to disk. - Index::deserialize(res, filename) loads an index from disk, creating the Index handle first so failing FFI calls still Drop the C-side allocation (RAII-safe error path). - A path_to_cstring helper rejects non-UTF-8 paths and paths containing an interior NUL byte with Error::InvalidArgument before any FFI call, mirroring the CAGRA implementation. No bindings regeneration was needed: both symbols are already present in cuvs-sys (brute_force is pulled in via core/all.h). Tests (mirroring the CAGRA serialize tests): - test_brute_force_serialize_deserialize: round-trips the index to disk, asserts the file exists and is non-empty, reloads it, and re-verifies a self-neighbor search on the loaded index. - test_brute_force_serialize_rejects_interior_nul: confirms an interior NUL byte in the path surfaces as Error::InvalidArgument, not a panic.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR adds file serialization support to the Rust brute-force KNN Index, introducing a safe path-to-CString helper, public serialize/deserialize methods that wrap cuVS C-API calls, and comprehensive tests for round-trip serialization and invalid path handling. ChangesIndex Serialization Support
🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@rust/cuvs/src/brute_force.rs`:
- Around line 269-270: The test currently uses a fixed temp filename
("test_brute_force_index.bin") which can collide across parallel runs; replace
that with a unique temporary file using the tempfile crate (e.g., create a
tempfile::NamedTempFile and use its path for serialization), or generate a
unique filename (PID + UUID) in the temp dir; update the code that builds
filepath (currently the std::env::temp_dir().join(...) expression) to use the
NamedTempFile path (or unique name) and ensure the tempfile crate is added to
Cargo.toml if not already present so index.serialize(&res, &filepath) writes to
a non-colliding location.
In `@UPSTREAM_PR_BODY.md`:
- Around line 57-63: The fenced code block in UPSTREAM_PR_BODY.md is missing a
language identifier which triggers MD040; update the triple-backtick fence that
surrounds the cargo test output to include a language (e.g., change ``` to
```text) so the block is explicitly marked as plain text and the markdownlint
warning is resolved.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: af398801-b0ca-4312-8e90-144f5c458bfb
📒 Files selected for processing (2)
UPSTREAM_PR_BODY.mdrust/cuvs/src/brute_force.rs
Add serialize/deserialize to the Rust brute force index
What
Adds
serializeanddeserializeto the Rustbrute_force::Index, wrapping theexisting C entry points
cuvsBruteForceSerialize/cuvsBruteForceDeserialize.This brings the brute force binding to parity with the CAGRA binding, which already
exposes serialize/deserialize.
Index::serialize(&self, res, filename)writes the index to disk.Index::deserialize(res, filename) -> Result<Index>loads an index from disk.Both methods mirror the CAGRA implementation:
path_to_cstringhelper converts the filesystem path to aCString,returning
Error::InvalidArgument(instead of panicking) for paths that are notvalid UTF-8 or that contain an interior NUL byte. The path is validated before any
FFI call is made.
check_cuvs.deserializeconstructs theIndexhandle first (viaIndex::new()), so that ifthe underlying
cuvsBruteForceDeserializecall fails, the handle'sDropstillruns and releases the C-side index allocation (RAII-safe error path).
The doc comments note that the serialization format may change between cuVS versions,
matching the wording in the C header.
Notes for reviewers
cuvsBruteForceSerializeandcuvsBruteForceDeserializeare already present inrust/cuvs-sys/src/bindings.rs(brute_force is pulled in through
core/all.h), so this change is purely additiveon the safe Rust wrapper side and touches no generated code.
Indexkeeps a non-owning deviceview of its dataset (
_dataset). The serialize round-trip test deliberately keepsthe host
ndarrayarray in the same scope as the index for the duration of thetest, because the device tensor's
shapepointer borrows that array's dimensionstorage. Moving the host array while the index is alive would dangle that pointer
(this is a property of the existing
ManagedTensorview, not of these new methods).and other Rust binding PRs): if conflicts arise, resolve by merging
maininto thisbranch rather than rebasing, per the project's no-rebase contribution guideline.
Testing
Two new unit tests were added alongside the existing
test_l2, mirroring the CAGRAserialize tests:
test_brute_force_serialize_deserialize— builds an index, serializes it, assertsthe output file exists and is non-empty, deserializes it back, and re-verifies that
a self-neighbor search on the loaded index returns each query as its own nearest
neighbor.
test_brute_force_serialize_rejects_interior_nul— confirms that a path containingan interior NUL byte surfaces as
Error::InvalidArgumentrather than panicking.All brute force tests pass (run single-threaded on a single GPU):
cargo fmtandcargo clippyare clean for the changed file.On
path_to_cstring: intentionally mirrored byte-for-byte fromcagra/index.rs(a third copy arrives with the in-flight IVF-SQ PR #2229). Happy to hoist it into a shared module here or as a follow-up once the serialize PRs settle — whichever you prefer.