Skip to content

Conversation

@xbattlax
Copy link
Contributor

Summary

Add comprehensive user documentation for the DataFusion integration, addressing issue #2027.

Changes

  • Add website/src/datafusion.md documentation page covering:

    • Catalog-based access with IcebergCatalogProvider
    • SQL operations: CREATE TABLE, INSERT INTO, SELECT
    • Metadata tables ($snapshots, $manifests)
    • External tables via IcebergTableProviderFactory
    • Partitioned tables and write modes (fanout vs clustered)
    • Query optimization (projection, filter, LIMIT pushdown)
    • Configuration options and limitations
  • Add crates/examples/src/datafusion_integration.rs working example with annotated code sections for documentation includes

  • Update website/src/SUMMARY.md to include new documentation page

  • Add required dependencies to crates/examples/Cargo.toml

Notes

This addresses the documentation request from #2027, which noted substantial progress in DataFusion integration (per epic #1382). The documentation covers all major features merged including:

Closes #2027

Add comprehensive user documentation for the DataFusion integration
that covers SQL-based table operations, catalog integration, and
query optimization features.

Changes:
- Add datafusion.md documentation page with setup, SQL operations,
  metadata tables, partitioned tables, and configuration options
- Add datafusion_integration.rs example with annotated code sections
- Update SUMMARY.md to include new documentation page
- Add required dependencies to examples crate

Closes apache#2027
@xbattlax xbattlax force-pushed the docs_datafusion_integration branch from 853219a to 4830a74 Compare January 15, 2026 08:31
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xbattlax for this pr, just finished first round of review. cc @CTTY to also take a look.

- Change catalog name from "iceberg" to "my_catalog" to clarify it has no special meaning
- Move external table code to separate example file
- Remove sections not suited for end-users: External Tables, Table Provider Types, Creating Partitioned Tables (Rust API), Query Optimization
- Add clarification that table properties must be set via Iceberg catalog API
@xbattlax
Copy link
Contributor Author

xbattlax commented Jan 16, 2026

Thanks @xbattlax for this pr, just finished first round of review. cc @CTTY to also take a look.

Thank you two for the review! First round done :)

@xbattlax xbattlax force-pushed the docs_datafusion_integration branch 3 times, most recently from 68f9f95 to 6cb8fbd Compare January 16, 2026 10:25
@xbattlax xbattlax force-pushed the docs_datafusion_integration branch from 6cb8fbd to 7f6bfda Compare January 16, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update user doc for datafusion integration.

3 participants