-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
dvc currently works with Azure blob storage but does not work with Onelake as remote storage.
It seems that dvc should be able to support Onelake with the existing setup for Azure blob storage with the underlying adlfs package; see fsspec/adlfs#416 and fsspec/adlfs#486. I think adding account_host to the schema for Azure in dvc/config_schema.py would allow passing the onelake.blob.fabric.microsoft.com endpoint when creating the AzureBlobFileSystem class in https://github.com/treeverse/dvc-azure/blob/main/dvc_azure/__init__.py#L208. Then Onelake could be used like so.
dvc remote add myremote azure://<workspace>/<lakehouse>.Lakehouse
dvc remote modify myremote account_name 'onelake'
dvc remote modify myremote account_host 'onelake.blob.fabric.microsoft.com'
I'm not sure how fs_args is constructed for the AzureFileSystem class in https://github.com/treeverse/dvc-azure/blob/main/dvc_azure/__init__.py#L40C7-L40C22, so I may be missing some steps (hence the issue with no PR).