Skip to content

Support for Onelake as remote storage #10972

@marberts

Description

@marberts

dvc currently works with Azure blob storage but does not work with Onelake as remote storage.

It seems that dvc should be able to support Onelake with the existing setup for Azure blob storage with the underlying adlfs package; see fsspec/adlfs#416 and fsspec/adlfs#486. I think adding account_host to the schema for Azure in dvc/config_schema.py would allow passing the onelake.blob.fabric.microsoft.com endpoint when creating the AzureBlobFileSystem class in https://github.com/treeverse/dvc-azure/blob/main/dvc_azure/__init__.py#L208. Then Onelake could be used like so.

dvc remote add myremote azure://<workspace>/<lakehouse>.Lakehouse
dvc remote modify myremote account_name 'onelake'
dvc remote modify myremote account_host 'onelake.blob.fabric.microsoft.com'

I'm not sure how fs_args is constructed for the AzureFileSystem class in https://github.com/treeverse/dvc-azure/blob/main/dvc_azure/__init__.py#L40C7-L40C22, so I may be missing some steps (hence the issue with no PR).

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussionrequires active participation to reach a conclusionp3-nice-to-haveIt should be done this or next sprint

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions