-
Notifications
You must be signed in to change notification settings - Fork 392
Description
Background
DataFusion already supports CREATE EXTERNAL TABLE ... STORED AS ICEBERG. Today, iceberg-rust integrates via IcebergTableProviderFactory, but the factory primarily supports registering a static table (e.g., created from a metadata JSON path). That works for:
-- Static table (existing, backward compatible)
CREATE EXTERNAL TABLE my_table
STORED AS ICEBERG
LOCATION '/path/to/metadata.json';However, we also want CREATE EXTERNAL TABLE to create a normal IcebergTableProvider backed by a Catalog, so users can define the catalog via SQL OPTIONS (and then resolve tables by identifier through that catalog).
Dumping my thoughts here and feedbacks are welcome!
Option A: Build Catalog inside the ProviderFactory using OPTIONS
IcebergTableProviderFactory parses OPTIONS and uses a CatalogBuilder to construct the Catalog internally, then creates a normal IcebergTableProvider
CREATE EXTERNAL TABLE my_table
STORED AS ICEBERG
LOCATION 'ignored_or_optional' // this will be ignored if a catalog is configured
OPTIONS (
'datafusion.iceberg.catalog.type' = 'rest', // if catalog type is not configured, it should fall back to create static table
'datafusion.iceberg.catalog.uri' = 'http://localhost:8181',
'datafusion.iceberg.catalog.warehouse' = 's3://bucket/warehouse'
);Option B: Allow injecting a pre-built Catalog into the factory
Essentially we have
pub struct IcebergTableProviderFactory {
catalog: Option<Arc<dyn Catalog>>, // when it's none, fall back to static table
}
...
IcebergTableProviderFactory::new_with_catalog(Arc<dyn Catalog>)I prefer this as it is much more straight-forward, but one drawback I can think of is users cannot easily use multiple catalogs at the same time. A workaround would look like this:
state
.table_factories_mut()
.insert("ICEBERG_REST_A".to_string(), Arc::new(IcebergTableProviderFactory(rest_catalog_a)));
state
.table_factories_mut()
.insert("ICEBERG_REST_B".to_string(), Arc::new(IcebergTableProviderFactory(rest_catalog_b)));and then when creating the table using sql:
CREATE EXTERNAL TABLE my_table
STORED AS ICEBERG_REST_A
...
Willingness to contribute
I can contribute to this feature independently