feat: add BITSET set operations (union, intersect, diff, xor)#39
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the functionality of Judy::BITSET arrays by introducing native C implementations for common set operations: union, intersection, difference, and exclusive OR. These additions allow developers to perform complex set manipulations efficiently, returning new Judy arrays without altering the original data. The changes improve the utility of Judy arrays for managing integer sets and align with the project's roadmap for core features. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Test Results
Benchmark Summary (Judy vs PHP Array)Ratio = Judy / Array. Bold = Judy wins (≤0.95x). Plain = Array is faster/smaller. Time (Write / Read) — Linux
Memory — Linux
Time (Write / Read) — Windows
Memory — Windows
Raw benchmark dataWrite Time — Linux
Read Time — Linux
Memory — Linux
Write Time — Windows
Read Time — Windows
Memory — Windows
|
There was a problem hiding this comment.
Code Review
This pull request introduces set operations (union, intersect, diff, xor) for BITSET Judy arrays, which is a great feature. The implementation is mostly correct and is accompanied by a good set of tests covering various cases, including type errors and ensuring the original arrays are not modified.
My review focuses on three main areas for improvement in php_judy.c:
- Robustness: The new methods lack error handling for memory allocation failures within the underlying Judy C library calls, which could lead to silent data corruption under memory pressure.
- Performance: The
intersectmethod can be optimized by always iterating over the smaller of the two sets. - Readability: The code in
intersect,diff, andxorcan be made clearer by not reusing variables for different semantic purposes.
I've provided specific suggestions for each of these points.
Add four new methods for TYPE_BITSET arrays that return new Judy objects without modifying the operands. Calling on non-BITSET arrays throws an exception. Closes #37 (set operations item). Changes: - Use separate Rc_set variable for J1S return codes (clarity) - Check J1S for JERR and throw on allocation failure (robustness) - Optimize intersect() to iterate the smaller set (performance) - Add benchmark script comparing Judy vs PHP array set operations
f59b4f0 to
b0c4b32
Compare
Summary
union(),intersect(),diff(), andxor()Judy(Judy::BITSET)without modifying either operandNew methods
$a->union($b)$a->intersect($b)$a->diff($b)$abut not in$b$a->xor($b)Example
Benchmark results
PHP 8.3.30 on Linux (aarch64), 50% overlap between sets A and B:
10,000 indices per set
100,000 indices per set
500,000 indices per set
Test plan
examples/judy-bench-set-operations.php)