-
Notifications
You must be signed in to change notification settings - Fork 20
CNDB-16308: Add GLOBAL_HOLES_ALLOWED logic to CompactionGraph #2203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Checklist before you submit for review
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds logic to consider the GLOBAL_HOLES_ALLOWED percentage when building compaction graphs to prevent excessive memory consumption from large ONE_TO_MANY postings lists. The change evaluates graph statistics after construction to determine whether to use the ZERO_OR_ONE_TO_MANY structure (which stores data on disk) instead of ONE_TO_MANY (which uses in-memory postings). The GLOBAL_HOLES_ALLOWED threshold is made configurable via a system property.
Changes:
- Introduced a new configurable system property
cassandra.sai.vector.ordinal_hole_density_limitwith a default value of 0.01 - Added logic in
CompactionGraphto check for excessive ordinal mapping holes and switch toZERO_OR_ONE_TO_MANYstructure when necessary - Refactored hole density checking into a dedicated method
tooManyOrdinalMappingHoles()for reuse across the codebase
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/java/org/apache/cassandra/config/CassandraRelevantProperties.java | Added new system property SAI_VECTOR_ORDINAL_HOLE_DENSITY_LIMIT for configuring the hole density threshold |
| src/java/org/apache/cassandra/index/sai/disk/v5/V5VectorPostingsWriter.java | Refactored hole density check into tooManyOrdinalMappingHoles() method and updated GLOBAL_HOLES_ALLOWED to use the new configurable property |
| src/java/org/apache/cassandra/index/sai/disk/vector/CompactionGraph.java | Added hole density check during graph flush to determine appropriate postings structure |
| test/unit/org/apache/cassandra/index/sai/cql/VectorCompactionTest.java | Enhanced test to validate postings structure after compaction and properly restore GLOBAL_HOLES_ALLOWED value |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/java/org/apache/cassandra/index/sai/disk/v5/V5VectorPostingsWriter.java
Show resolved
Hide resolved
src/java/org/apache/cassandra/index/sai/disk/v5/V5VectorPostingsWriter.java
Outdated
Show resolved
Hide resolved
❌ Build ds-cassandra-pr-gate/PR-2203 rejected by Butler4 regressions found Found 4 new test failures
Found 2 known test failures |
What is the issue
Fixes: https://github.com/riptano/cndb/issues/16308
What does this PR fix and why was it fixed
When we build with the compaction graph, we have to decide the kind of postings structure. Currently, we ignore the
GLOBAL_HOLES_ALLOWEDpercentage after graph construction, which can lead to building very largeONE_TO_MANYpostings lists (which leads to increased memory consumption and graph sizes). We need to instead look at the graph stats after construction and decide if they warrant choosing theZERO_OR_ONE_TO_MANYstructure, which will put everything on disk.Because this change could push some graphs that previously built using
ONE_TO_MANYtoZERO_OR_ONE_TO_MANY, we can expect that we might see reduced hybrid search performance due to the need to go to disk more. This is a necessary change though to prevent excessive memory utilization here. Because this is a trade off without an absolute right choice, this PR makesGLOBAL_HOLES_ALLOWEDconfigurable as a-Dsystem property named:cassandra.sai.vector.ordinal_hole_density_limit.