Skip to content

DVC+GTO integration #337

@aguschin

Description

@aguschin

Just had a discussion with @dberenbaum yesterday about this. Overall, feels like there are too many issues blocked by missing GTO/DVC integration. At the same time, I feel like GTO is not useful without DVC and is more like a plugin to DVC.

So, two options are possible: build an integration in GTO, or just merge GTO into DVC.

There are not many decent ways to actually store binaries in repo, which makes it hard to imagine using GTO without DVC in the real-world scenarios. I can recall these options:

  1. Commit binaries to the repo - not a real option thou
  2. Use git LFS
  3. Use DVC
  4. Commit them manually somewhere (could be s3, artifactory or something)

Supporting option 4 in GTO to have upload/download functionality as in #307 will require integrations with each place (for s3 it would be fsspec probably, for artifactory some python client of theirs, etc). It makes me think these integrations could be also part of DVC as well (dvc import-url, etc).

Now if we can't imagine a good use-case for GTO without DVC, or all those use-cases would require some machinery that could also be part of DVC, why not just merge GTO into DVC?

Asking your opinion @francesco086 @shortcipher3 @bgalvao since you're the most active GTO users I know about :)

(this is not something we need to decide right away, since we can just build this integration inside of GTO, and then merge - but I think merging could make it more straightforward. I'm just collecting opinions for now to have more detailed picture).

Metadata

Metadata

Assignees

Labels

designDesign questions, that affects the product significantlydiscussionDiscussion is needed to reach conclusion

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions