-
Notifications
You must be signed in to change notification settings - Fork 20
CNDB-16484: handle brute force edge case selecting zero vectors #2202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Avoids this error: java.lang.IllegalArgumentException: initialSize must be > 0 and < 2147483630; got: 0 at io.github.jbellis.jvector.util.AbstractLongHeap.<init>(AbstractLongHeap.java:52) at io.github.jbellis.jvector.util.BoundedLongHeap.<init>(BoundedLongHeap.java:47) at io.github.jbellis.jvector.util.BoundedLongHeap.<init>(BoundedLongHeap.java:43) at org.apache.cassandra.index.sai.disk.v2.V2VectorIndexSearcher.orderByBruteForce(V2VectorIndexSearcher.java:328) at org.apache.cassandra.index.sai.disk.v2.V2VectorIndexSearcher.orderByBruteForce(V2VectorIndexSearcher.java:283) at org.apache.cassandra.index.sai.disk.v2.V2VectorIndexSearcher.searchInternal(V2VectorIndexSearcher.java:246) at org.apache.cassandra.index.sai.disk.v2.V2VectorIndexSearcher.orderBy(V2VectorIndexSearcher.java:172) at org.apache.cassandra.index.sai.disk.v1.Segment.orderBy(Segment.java:160)
Checklist before you submit for review
|
|
❌ Build ds-cassandra-pr-gate/PR-2202 rejected by Butler3 regressions found Found 3 new test failures
No known test failures found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All brute force code paths are now adequately protected:
✅ The fix in this commit protects the main entry point (V2VectorIndexSearcher.java:272)
✅ The two overloaded variants in V2VectorIndexSearcher are indirectly protected (only callable after the size check)
✅ Both VectorMemtableIndex brute force paths already have explicit empty checks at their call sites
There is one cqlsh test failure that seems new but it also seems unrelated. Please test it, just in case, locally and we can open a followup ticket if it is confirmed unrelated. The other PR also looks good.
The cqlsh failure is from dtest and is not related to this PR: I'm not sure how to run that dtest suite locally. I don't see how this could be related since the python test does not appear related to vector. I am not concerned by this test and feel comfortable merging without re-testing. Would you prefer that I at least re-run the tests on this PR to see if it fails again? |
|
I also don't expect it to be related but as cassandra keeps on surprising me I quickly ran it locally and it passes with your branch. Opened a ticket for flaky test - #16502 Just FYI:
There is a ReadMe - https://github.com/datastax/cassandra/blob/main/pylib/README.asc But more or less:
|
|
Thanks @ekaterinadimitrova2! Now I know for next time |



What is the issue
Fixes: https://github.com/riptano/cndb/issues/16484
What does this PR fix and why was it fixed
When a partition restriction filters down to rows without vectors, we can get a list of 0 rows, and then we attempt to create a bounded heap of size 0, which is invalid. The added test catches this error. The solution is to exit the method early in the event of a zero length
SegmentRowIdOrdinalPairsobject.