feat(index): add sorted parquet indexes with CREATE/DROP/SHOW INDEX support #117
Draft
shefeek-jinnah wants to merge 2 commits intomainfrom
Draft
feat(index): add sorted parquet indexes with CREATE/DROP/SHOW INDEX support #117shefeek-jinnah wants to merge 2 commits intomainfrom
shefeek-jinnah wants to merge 2 commits intomainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Status: DRAFT — Parked for review
Summary
Adds sorted parquet projection indexes to the caching layer. When a user creates an index on a cached table, the engine writes a second parquet file with rows physically sorted by the indexed column(s). At
query time, the planner selects the best available index based on WHERE clause filters, enabling row-group pruning on sorted statistics and sort-elimination for ORDER BY queries.
What's Done
Core indexing engine
CREATE INDEX idx ON catalog.schema.table (col1, col2, ...)— sorts cached parquet data by specified columns and writes a separate index parquet fileDROP INDEX idx ON catalog.schema.table— removes index metadata and parquet fileSHOW INDEXES ON catalog.schema.table— lists all indexes for a tableengine.rs(parsed before DataFusion, since these aren't standard DataFusion DDL)Sorted parquet writer (
src/datafetch/sorted_parquet.rs)In-memory batch collector (
src/datafetch/collecting_writer.rs)CollectingBatchWriteraccumulates batches in memory for multi-pass processing (e.g., creating multiple index presets from the same base data)Index-aware query planning (
src/datafusion/lazy_table_provider.rs)Index preset registry (
src/datafetch/index_presets.rs)with_tpch_index_presets())tpch_optimizedpreset defining 6 indexes across lineitem, orders, customer, partsuppCatalog persistence (
migrations/sqlite/v5.sql,migrations/postgres/v5.sql)indexestable with unique constraint on(connection_id, schema_name, table_name, index_name)(connection_id, schema_name, table_name)for efficient lookupsConfigurable parquet settings (
ParquetConfig)max_row_group_sizeandbloom_filter_enabledare now configurable via engine builderDataFusion upgrade
liquid-cache-clienttemporarily disabled (waiting for DF 52 compatibility)Benchmark harness (
tests/tpch_benchmark_tests.rs)test_tpch_benchmark,test_tpch_benchmark_presets,test_index_performance,test_old_vs_current_config,test_rowgroup_size_comparisonFiles Changed (22 files, +4870 / -191)
engine.rs(+952)sorted_parquet.rs,collecting_writer.rs,index_presets.rslazy_table_provider.rs(+434),parquet_exec.rsnative/parquet_writer.rs,orchestrator.rs(+298)manager.rs,sqlite_manager.rs,postgres_manager.rs,caching_manager.rs,mock_catalog.rssqlite/v5.sql,postgres/v5.sqltpch_benchmark_tests.rs(+2210),result_persistence_tests.rsCargo.toml(DF 51→52, liquid-cache disabled)Configuration: Old vs New
Writer config (old — main branch):
Why DataFusion 52?
The sorted index feature depends on sort pushdown (apache/datafusion#10433), landed in DataFusion 52 via apache/datafusion#19064.
It adds a PushdownSort optimizer rule that detects SortExec nodes in physical plans and pushes sorting requirements down to data sources via a new try_pushdown_sort() trait method.
The key concept is Inexact ordering. When DataFusion sees ORDER BY o_orderdate DESC LIMIT 5 on a parquet file:
The critical detail: it does NOT reverse rows within row groups. It only reorders which row groups are read first. That's why it's "Inexact" — row-group-level ordering, not row-level. The Sort operator above
it handles the final correctness.
Before this PR (DataFusion 51), there was no mechanism for the planner to tell a data source "I need this data sorted" — so sorted parquet files gave zero benefit. The sort always happened in memory after a
full scan.
Why reverse order scanning not possilbe within a row group ?
Rows within a row group are not individually addressable. They're stored as columnar chunks where each column is compressed and encoded (dictionary encoding, RLE, delta encoding, etc.). To read row N of a
column chunk, you have to decompress and decode all pages from the start of that chunk up to row N. There's no "seek to row 99,995" — the encoding is sequential.
So when DataFusion does reverse_row_groups=true on an ASC-sorted index for a DESC query:
That's why a DESC index solves this — the 5 highest dates are the first 5 rows in the first row group. Decompress one page, read 5 rows, done.