Skip to content

Conversation

@scottopell
Copy link
Contributor

What does this PR do?

A brief description of the change being made with this pull request.

Motivation

What inspired you to submit this pull request?

Related issues

A list of issues either fixed, containing architectural discussions, otherwise relevant
for this Pull Request.

Additional Notes

Anything else we should know when reviewing?

Copy link
Contributor Author

scottopell commented Jan 14, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

scottopell and others added 2 commits January 15, 2026 18:44
Replace MapArray-based label storage with flat l_<key> columns in
Parquet output. This enables predicate pushdown for filtering by
container_id and other labels, avoiding full file scans.

Key changes:
- Dynamic schema generation based on discovered label keys
- Dictionary encoding for low-cardinality label columns
- Lazy ArrowWriter initialization (schema determined at first flush)
- Updated validation and round-trip tests for new schema

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add BloomFilterConfig and BloomFilterColumn types to configure bloom
filters on label columns. Bloom filters enable efficient query-time
filtering by allowing readers to skip row groups that definitely don't
contain a target value.

New APIs:
- Format::with_bloom_filter() - create writer with bloom filter config
- format.bloom_filter_config() - getter for rotation
- CaptureManager::new_parquet_with_bloom_filter()
- CaptureManager::new_multi_with_bloom_filter()

Backwards compatible - existing Format::new() and new_parquet() still
work unchanged using BloomFilterConfig::default().

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@scottopell scottopell force-pushed the sopell/add-parquet-rotation branch from 83e2256 to 8bfd82e Compare January 15, 2026 18:44
@scottopell scottopell force-pushed the sopell/parquet-flat-label-opt branch from 15e7333 to 410a2dc Compare January 15, 2026 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants