Skip to content

Predict IVF-PQ FP16 overflow and auto-switch to FP32#2246

Open
huuanhhuyn wants to merge 8 commits into
NVIDIA:mainfrom
huuanhhuyn:ivfpq_fp16_overflow
Open

Predict IVF-PQ FP16 overflow and auto-switch to FP32#2246
huuanhhuyn wants to merge 8 commits into
NVIDIA:mainfrom
huuanhhuyn:ivfpq_fp16_overflow

Conversation

@huuanhhuyn

@huuanhhuyn huuanhhuyn commented Jun 16, 2026

Copy link
Copy Markdown

Large-magnitude unnormalized datasets (e.g. SIFT_1M) contains vectors whose norm overflowed the FP16 internal search types of IVF-PQ.

This PR detects it by sampling a fraction from the dataset, compute its L2 squared norm and estimate the max squared distance from the samples. When the distance reaches FP16 overflow range, the internal types fall back to FP32.

Sampling: We uniformly sampled n_sample samples from the n_rows of the dataset
image

Measured runtime: several hundreds of milliseconds on 100M rows

@copy-pr-bot

copy-pr-bot Bot commented Jun 16, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new ivf_pq_fp16_overflow.cuh header that samples dataset rows to estimate the maximum squared norm via map-reduce, derives a distance upper bound, and compares it against a defensive FP16 threshold. The CAGRA build(...) function includes this header and calls estimate_fp16_overflow to auto-upgrade distance dtypes to FP32 when overflow is detected.

Changes

FP16 Overflow Detection and CAGRA Integration

Layer / File(s) Summary
FP16 overflow detection header: sampler and threshold logic
cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh
Adds includes for RAFT CUDA utilities, MDSpan, row-sampling, RNG, and distance utilities. Implements detail::estimate_max_squared_norm that samples a saturated fraction of dataset rows into device memory, computes mapped squared norms per row, reduces to the global maximum via map-reduce, synchronizes, and returns the scalar. Implements helpers::estimate_fp16_overflow that early-exits for empty datasets and CosineExpanded metrics, defines an FP16 max-based overflow threshold with defensive margin, derives a distance upper bound (scaled by 4.0 for L2Expanded), and returns whether the bound exceeds the threshold.
CAGRA build: include and FP16 guard wiring
cpp/src/neighbors/detail/cagra/cagra_build.cuh
Adds the include for ivf_pq_fp16_overflow.cuh at the top of the file. Inserts a pre-build overflow-risk check in build(...) that, when ivf_pq_params is detected and either internal_distance_dtype or coarse_search_dtype is FP16, calls estimate_fp16_overflow and, on detected overflow risk, logs a warning and overrides both dtypes to CUDA_R_32F while leaving lut_dtype unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Predict IVF-PQ FP16 overflow and auto-switch to FP32' accurately and concisely describes the main changes: FP16 overflow prediction and automatic precision switching in IVF-PQ operations.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description clearly explains the problem (FP16 overflow in IVF-PQ for large datasets), the solution (sampling-based overflow detection), and implementation details (sampling strategy and fallback to FP32).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh`:
- Line 39: Remove the debug printf statement at line 39 in the
ivf_pq_fp16_overflow.cuh file. This printf executes within a loop that iterates
over every dimension of every sampled row, causing severe performance
degradation and output flooding. Simply delete the entire printf line as it
serves no purpose in production code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 086831a1-8cdd-42f2-aaaf-8ac33b75c173

📥 Commits

Reviewing files that changed from the base of the PR and between 6672103 and faa1609.

📒 Files selected for processing (2)
  • cpp/src/neighbors/detail/cagra/cagra_build.cuh
  • cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh

Comment thread cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh (1)

70-88: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

HIGH: Missing validation could cause kernel launch with zero grid blocks.

If n_rows == 0 (empty dataset) or if sample_fraction is extremely small with a moderate dataset size (e.g., n_rows=5000, sample_fraction=0.0001), n_sample becomes 0, resulting in grid_size = 0. Launching a kernel with zero blocks is invalid.

Consider adding a minimum sample size guard:

Suggested fix
   int64_t n_sample = static_cast<int64_t>(sample_fraction * static_cast<double>(n_rows));
   if (n_rows <= 1000) {
     n_sample = n_rows;  // for small datasets, just use them all and skip the sampling overhead
   } else if (n_rows > 100000) {
     // cap the sample size to 100k for speed and keep memory use within the limited workspace resource
     n_sample = 100000;
   }
+  
+  // Handle empty dataset or degenerate sample size
+  if (n_sample <= 0) { return 0.0f; }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh` around lines 70 - 88, Add
a minimum sample size guard to prevent grid_size from being zero when launching
the kernel. After the existing size constraint checks for n_rows (the checks for
n_rows <= 1000 and n_rows > 100000), add a condition to ensure n_sample is at
least 1, since if n_sample is 0, the subsequent calculation of grid_size will be
0, resulting in an invalid kernel launch with zero blocks. This guards against
both empty datasets (n_rows == 0) and extremely small sample fractions.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh`:
- Around line 70-88: Add a minimum sample size guard to prevent grid_size from
being zero when launching the kernel. After the existing size constraint checks
for n_rows (the checks for n_rows <= 1000 and n_rows > 100000), add a condition
to ensure n_sample is at least 1, since if n_sample is 0, the subsequent
calculation of grid_size will be 0, resulting in an invalid kernel launch with
zero blocks. This guards against both empty datasets (n_rows == 0) and extremely
small sample fractions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a794a6f7-8a2b-4f29-ac12-49b5d25d10ab

📥 Commits

Reviewing files that changed from the base of the PR and between faa1609 and 3b5130f.

📒 Files selected for processing (1)
  • cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh

@mfoerste4 mfoerste4 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @huuanhhuyn for the PR. Please utilize raft for the norm computation if possible.

Comment thread cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh Outdated
Comment thread cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh Outdated
Comment thread cpp/src/neighbors/detail/cagra/cagra_build.cuh Outdated
@huuanhhuyn huuanhhuyn requested a review from mfoerste4 June 22, 2026 10:26

@mfoerste4 mfoerste4 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @huuanhhuyn for the PR!

RAFT_EXPECTS(kDelay >= kSaturation,
"kDelay must not be smaller than kSaturation so that n_sample is always less than "
"or equal to n_rows");
int64_t n_sample = (n_rows * kSaturation + (n_rows + kDelay - 1)) / (n_rows + kDelay);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use raft::ceildiv here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants