mlflow_logger_v1 - Adding option to log last model by adin786 · Pull Request #13 · explosion/spacy-loggers

adin786 · 2022-11-05T02:22:44Z

I noticed the mlflow logger only seems to log the "best" model. This is a bit confusing as the metrics summary mlflow displays, gives the results for the "last" model.

Anyway the wandb logger and the clearml logger both look like they include some functionaliity to choose which model folder to track (or both). So I added functionality to log the "last" model with a boolean option in the [training.logger] config. I thought Id stick to a similar pattern as the one implemented in the wandb logger (which handles this aspect neatly).

Note: I ended up using mlflow.log_artifact() rather than mlflow.spacy.log_model() because when I use the latter method outside of the log_step() function it looks like spacy just aborts the process abruptly, without letting the mlflow.spacy function do its thing. It works if I put it in the log_step() function but I wanted to just trigger an upload at the end... otherwise we'd seriously slow down the training process by forcing a model upload every training step. Perhaps I could use some feedback on this part

Separately, I also added the "step" value to each mlflow.log_metrics() call as this just makes the mlflow plot visualisations a bit nicer if you want to view it by step rather than by time on the x axis.

adin786 · 2022-11-05T10:20:01Z

Just to add, I probably did an overkill/hacky thing by making a class just to persist the model path for later uploading. Would be cleaner if I just used output_path as a nonlocal so it can be accessed later in the finalize() method.

I'll wait for feedback before messing with this. The point is, just like the wandb logger I just want to pass the model path forward so I can call mlflow.log_artifact() etc once at the end of the training run. I think it's probably fine for the "best" model to get logged on each (improved) step because that's at least not every step.

svlandeg · 2022-11-23T09:18:14Z

Hi, thanks for your contribution! We'll review the code and get back to you :-)

rmitsch

Thanks for your contribution, @adin786! Sorry it took me so long to review this.

Only some minor comments - and the issue with not being able to log artifacts with mlflow.spacy.log_model() outside of log_step() I'll look into.

rmitsch · 2022-12-07T11:07:32Z

spacy_loggers/mlflow.py

+class ModelDir:
+    def __init__(self) -> None:
+        self.path = None
+
+    def update(self, path: str) -> None:
+        self.path = path
+


I think we can ditch ModelDir - let's use a simple string for the latest model path (latest_model_dir or similar).

rmitsch · 2022-12-07T11:20:13Z

spacy_loggers/mlflow.py

+        def log_model(path, name):
+            mlflow.log_artifacts(
+                path, 
+                name
+            )


Similar: since this is just a wrapper without added functionality, let's use mlflow.log_artifacts() directly.

rmitsch · 2022-12-07T11:23:36Z

spacy_loggers/mlflow.py

                    mlflow.spacy.log_model(nlp, "best")
+                    log_model(output_path, 'model_best')


It irks me that now we have three artifacts - best, model_best and model_last. Ideally I'd like to have best and last, both the complete set of associated data (like best has, see screenshot). Just for the sake of consistency. If that doesn't work out due to the issues you described, I guess this is fine. I'll look into this.

Note: I ended up using mlflow.log_artifact() rather than mlflow.spacy.log_model() because when I use the latter method outside of the log_step() function it looks like spacy just aborts the process abruptly, without letting the mlflow.spacy function do its thing.

Hm, I ran this with

def finalize() -> None: if log_latest_dir: mlflow.spacy.log_model(nlp, "model_last")

now and this seemed to work (see image). What's your outcome?

rmitsch · 2022-12-07T11:23:55Z

spacy_loggers/mlflow.py

+
+            if log_latest_dir:
+                log_model(latest_model.path, 'model_last')
+
+            print('End run')


Suggested change

if log_latest_dir:

log_model(latest_model.path, 'model_last')

print('End run')

if log_latest_dir:

log_model(latest_model.path, 'model_last')

adin786 added 2 commits November 5, 2022 02:03

Changed mlflow logger to upload model_last artifact too

668e69a

Stripped out unnecessary logging statements

03e103a

rmitsch self-requested a review November 23, 2022 09:22

rmitsch added the enhancement New feature or request label Nov 23, 2022

rmitsch suggested changes Dec 7, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

mlflow_logger_v1 - Adding option to log last model#13

mlflow_logger_v1 - Adding option to log last model#13
adin786 wants to merge 2 commits intoexplosion:mainfrom
adin786:mlflow_enhancement

adin786 commented Nov 5, 2022

Uh oh!

adin786 commented Nov 5, 2022

Uh oh!

svlandeg commented Nov 23, 2022

Uh oh!

rmitsch left a comment

Uh oh!

rmitsch Dec 7, 2022

Uh oh!

rmitsch Dec 7, 2022

Uh oh!

rmitsch Dec 7, 2022

Uh oh!

rmitsch Dec 7, 2022

Uh oh!

rmitsch Dec 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		mlflow.spacy.log_model(nlp, "best")
		log_model(output_path, 'model_best')

Comments

Conversation

adin786 commented Nov 5, 2022

Uh oh!

adin786 commented Nov 5, 2022

Uh oh!

svlandeg commented Nov 23, 2022

Uh oh!

rmitsch left a comment

Choose a reason for hiding this comment

Uh oh!

rmitsch Dec 7, 2022

Choose a reason for hiding this comment

Uh oh!

rmitsch Dec 7, 2022

Choose a reason for hiding this comment

Uh oh!

rmitsch Dec 7, 2022

Choose a reason for hiding this comment

Uh oh!

rmitsch Dec 7, 2022

Choose a reason for hiding this comment

Uh oh!

rmitsch Dec 7, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants