Fix recall computation for fewer than k groundtruth results by magdalendobson · Pull Request #1069 · microsoft/DiskANN

magdalendobson · 2026-05-14T18:13:05Z

Groundtruth for filtered datasets uses the .rangeres format, since a filtered query may have few or no results. However, at the time of recall computation for top-k search, fewer than k results causes an error. This PR patches that error and computes recall for each point using the following paradigm:

If the point has 0 < k' <= k groundtruth results, recall is the number of correctly reported points divided by k'.
If the point has no groundtruth results, it is completely excluded from the recall calculation.

The average recall is then the sum of all recalls divided by the number of non-zero results. The points with zero results do not affect recall. The reasoning for this is that in a dataset with majority-null results for filtered queries, we want recall to capture our performance on the queries that actually have results, rather than being artificially high when we always correctly report that these points have no results.

Since this error will no longer trigger even for a regular top-k search in recall computation, I have pushed the error for not enough groundtruth results to when the groundtruth in the regular bin format is actually read. This is an improvement since a failure will now trigger before an index build and search, making a better experience for the user. I leave some of the instances of NotEnoughGroundTruth inside recall computation as an extra layer of failsafe, but they could also be removed if reviewers feel we should completely remove them.

Along the way I get rid of the minimum and maximum values in the recall stats, since after some discussion with the team it appears that no one uses them.

magdalendobson · 2026-05-14T18:18:49Z

@microsoft-github-policy-service agree company="Microsoft"

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes recall computation when filtered/variable-length groundtruth contains fewer than k results by computing per-query recall against the available groundtruth and excluding zero-result queries from the average.

Changes:

Update recall computation to handle per-row k' <= k groundtruth sizes and ignore empty groundtruth rows in the average.
Move “insufficient groundtruth for top-k” validation to groundtruth loading for fixed-size .bin groundtruth.
Remove minimum / maximum from reported recall metrics.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File	Description
diskann-benchmark/src/utils/recall.rs	Removes min/max fields from the benchmark-facing recall metrics wrapper.
diskann-benchmark/src/utils/datafiles.rs	Adds optional `k` validation when loading groundtruth from `.bin`.
diskann-benchmark/src/backend/index/benchmarks.rs	Computes max `recall_k` across runs and passes it into groundtruth loading.
diskann-benchmark-core/src/recall.rs	Implements variable-length groundtruth recall + removes min/max metrics and updates tests accordingly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

magdalendobson · 2026-05-14T21:55:02Z

+    if let Some(expected_k) = k {
+        if groundtruth.ncols() != expected_k {
+            return Err(anyhow::anyhow!(
+                "Each row of groundtruth must have length {} (got {})",
+                expected_k,
+                groundtruth.ncols()
+            ));
+        }
+    }


@copilot apply changes based on this feedback

Fixed in f0b954d. Changed the check from ncols() != expected_k to ncols() < expected_k and updated the error message to "at least N neighbors" so .bin groundtruth files with more columns than the requested recall_k are accepted. Also updated the related test that was incorrectly expecting an error for dynamic groundtruth with fewer entries than recall_k — that's now valid behavior per the PR (filtered queries can have fewer results). Added test cases with non-zero values verifying correct recall values (full match → 1.0, partial match → 0.6) in 3de6550.

magdalendobson · 2026-05-14T21:55:31Z

    /// The average recall across all queries.
    pub average: f64,


@copilot apply changes based on this feedback

Updated in fd8f772. The doc comment for the average field now clarifies that it's "The average recall across queries with non-empty groundtruth" and that "Queries with zero groundtruth results are excluded from the average."

magdalendobson · 2026-05-14T21:57:24Z

+    // Compute the average recall
+    let total: f64 = recall_values.iter().sum();
+    let average = (total) / (num_nonzero as f64);


@copilot apply changes based on this feedback

Added in fd8f772. Two new test cases:

Mixed zero/non-zero groundtruth rows: 5 queries with groundtruth [1,2,3,4,5] (all match → recall=1.0 each) and 5 queries with empty groundtruth (excluded). Verifies average = 1.0 (denominator uses only 5 non-zero rows).

All-zero groundtruth: All 10 queries have empty groundtruth. Verifies average = 0.0 (not NaN/inf).

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

…more neighbors than needed Agent-Logs-Url: https://github.com/microsoft/DiskANN/sessions/5f7cd8dc-8e4f-4c69-aea5-ad2c61baee52 Co-authored-by: magdalendobson <58752279+magdalendobson@users.noreply.github.com>

…oundtruth tests Agent-Logs-Url: https://github.com/microsoft/DiskANN/sessions/5f7cd8dc-8e4f-4c69-aea5-ad2c61baee52 Co-authored-by: magdalendobson <58752279+magdalendobson@users.noreply.github.com>

…groundtruth mix Agent-Logs-Url: https://github.com/microsoft/DiskANN/sessions/75b8623f-538e-4a4c-9762-12aac534c708 Co-authored-by: magdalendobson <58752279+magdalendobson@users.noreply.github.com>

Agent-Logs-Url: https://github.com/microsoft/DiskANN/sessions/75b8623f-538e-4a4c-9762-12aac534c708 Co-authored-by: magdalendobson <58752279+magdalendobson@users.noreply.github.com>

…en/patch_recall_calculation

codecov-commenter · 2026-05-15T18:52:26Z

Codecov Report

❌ Patch coverage is 93.16239% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.51%. Comparing base (f4757a3) to head (bdee1db).
⚠️ Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
diskann-benchmark/src/backend/index/benchmarks.rs	30.00%	7 Missing ⚠️
diskann-benchmark/src/utils/datafiles.rs	96.55%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1069   +/-   ##
=======================================
  Coverage   89.51%   89.51%           
=======================================
  Files         459      459           
  Lines       85646    85707   +61     
=======================================
+ Hits        76663    76723   +60     
- Misses       8983     8984    +1

Flag	Coverage Δ
miri	`89.51% <93.16%> (+<0.01%)`	⬆️
unittests	`89.14% <93.16%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
diskann-benchmark-core/src/recall.rs	`97.99% <100.00%> (+0.18%)`	⬆️
diskann-benchmark/src/backend/exhaustive/minmax.rs	`100.00% <ø> (ø)`
...iskann-benchmark/src/backend/exhaustive/product.rs	`100.00% <ø> (ø)`
...kann-benchmark/src/backend/exhaustive/spherical.rs	`100.00% <ø> (ø)`
diskann-benchmark/src/backend/index/spherical.rs	`100.00% <ø> (ø)`
diskann-benchmark/src/inputs/graph_index.rs	`37.33% <100.00%> (+0.31%)`	⬆️
diskann-benchmark/src/utils/recall.rs	`57.14% <ø> (-5.36%)`	⬇️
diskann-benchmark/src/utils/datafiles.rs	`87.36% <96.55%> (+3.78%)`	⬆️
diskann-benchmark/src/backend/index/benchmarks.rs	`48.16% <30.00%> (+0.27%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

finish up recall computation patch

97b36ef

Magdalen Manohar added 2 commits May 14, 2026 19:53

fix bug in recall computation

cc013bd

remove stray extra registered benchmarks

1b6c4aa

magdalendobson marked this pull request as ready for review May 14, 2026 20:00

magdalendobson requested review from a team and Copilot May 14, 2026 20:00

Copilot AI reviewed May 14, 2026

View reviewed changes

Copilot started reviewing on behalf of magdalendobson May 14, 2026 20:20 View session

magdalendobson linked an issue May 14, 2026 that may be closed by this pull request

Fix recall computation when fewer than k results are present #1073

Open

Copilot started work on behalf of magdalendobson May 14, 2026 21:55 View session

magdalendobson and others added 3 commits May 14, 2026 17:55

Potential fix for pull request finding

d81f729

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Fix groundtruth ncols check: use < instead of != to allow files with …

f0b954d

…more neighbors than needed Agent-Logs-Url: https://github.com/microsoft/DiskANN/sessions/5f7cd8dc-8e4f-4c69-aea5-ad2c61baee52 Co-authored-by: magdalendobson <58752279+magdalendobson@users.noreply.github.com>

Add meaningful recall value assertions for variable-length dynamic gr…

3de6550

…oundtruth tests Agent-Logs-Url: https://github.com/microsoft/DiskANN/sessions/5f7cd8dc-8e4f-4c69-aea5-ad2c61baee52 Co-authored-by: magdalendobson <58752279+magdalendobson@users.noreply.github.com>

Copilot finished work on behalf of magdalendobson May 14, 2026 22:03

Copilot started work on behalf of magdalendobson May 14, 2026 22:03 View session

Copilot AI and others added 2 commits May 14, 2026 22:09

Update doc comment for average field and add tests for zero/non-zero …

fd8f772

…groundtruth mix Agent-Logs-Url: https://github.com/microsoft/DiskANN/sessions/75b8623f-538e-4a4c-9762-12aac534c708 Co-authored-by: magdalendobson <58752279+magdalendobson@users.noreply.github.com>

Remove trailing whitespace from test code

87efda0

Agent-Logs-Url: https://github.com/microsoft/DiskANN/sessions/75b8623f-538e-4a4c-9762-12aac534c708 Co-authored-by: magdalendobson <58752279+magdalendobson@users.noreply.github.com>

Copilot finished work on behalf of magdalendobson May 14, 2026 22:12

Magdalen Manohar added 4 commits May 15, 2026 14:31

Merge branch 'main' of github.com:microsoft/DiskANN into users/magdal…

62c2120

…en/patch_recall_calculation

small changes to tests

7a45865

fmt

15398fa

fix clippy, add max_k() function to topk search

852fceb

add a test for loading groundtruth with expected k

bdee1db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix recall computation for fewer than k groundtruth results#1069

Fix recall computation for fewer than k groundtruth results#1069
magdalendobson wants to merge 13 commits into
mainfrom
users/magdalen/patch_recall_calculation

magdalendobson commented May 14, 2026

Uh oh!

magdalendobson commented May 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

magdalendobson May 14, 2026

Uh oh!

Copilot AI May 14, 2026

Uh oh!

Uh oh!

magdalendobson May 14, 2026

Uh oh!

Copilot AI May 14, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

magdalendobson May 14, 2026

Uh oh!

Copilot AI May 14, 2026

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented May 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

magdalendobson commented May 14, 2026

Uh oh!

magdalendobson commented May 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

magdalendobson May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

magdalendobson May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

magdalendobson May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented May 15, 2026 •

edited

Loading