-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
During our recent exploration of ML collaboration tool-suites, we came across dvc, a well-established open source solution developed among others by the folks at Open Data Science, a community beloved by Russian speaking data scientists.
We'd like to give it a try since it fits really well with our values at source{d} and solves the core part of our problems when we experiment:
- it's open source;
- it relies heavily on git and git-like mechanisms;
- it doesn't try to solve everything in one huge single entry-point solution but rather tackles the core problems and let us free for the rest.
To try dvc, the first step is to use it individually in one or two projects:
- @m09 will set up
dvcfor the CodRep task in https://github.com/src-d/formatml/; - @m09 or @r0mainK will set up
dvcfor the topic modeling experiments (if @m09's first experiments are promising) in https://github.com/src-d/tm-experiments.
The second step is to have the ability to share the large data files and results for good teamwork and collaboration. To test this, two things are needed:
- Set up a
dvcremote on our ML Cluster; - Use it in a test project to see if it enhances teamwork, probably https://github.com/src-d/tm-experiments since we need to collaborate on it for dev2dev similarity.
Reactions are currently unavailable