bug(medcat): CU-869bj8g9k Fix hardcoded requirement for spacy model download #273
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
So far, if there was no Internet access, the fallback spacy model download failure would raise an exception and stall the entire process. Most models should come with their own spacy model anyway (if that's what they use). So the fallback model shouldn't be needed most of the time.
So this PR allows the subprocess for spacy model download to fail if there's a network issue. This should make it easier to use the library in scenarios where this method is called. This normally happens if/when a model is created from scratch and no on-disk model is provided. But it can alsoaffects converting models from v1 to v2 format.
The other thing this PR does is fix the renaming of 'odd' spacy models (e.g
spacy_modeland unsuported stuff likeen_core_sci_*). This was the main reason that theen_core_web_mdmodel was attempted to be downloaded at conversion time. Now, we use the path to the previously saved model instead and change the name of the model later down the road.There was another nuance that I had to address. Some older CDBs included a config. And if the config is converted along with the CDB, it normally makes more sense to fix the spacy model name right there and then. Only if we're in a full-model conversion scenario does it make sense to delay that (there's no way to guarantee the absolute path of the spacy model when converting a CDB on its own). This omission originally caused downstream stuff (medcat-service) workflows to fail.
So TLDR:
spacymodel download may have been initiated that could fail