Added ability to pass additional parameters to simpletransformer ner in RestorePuncts class. #5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Thanks for the great library! When running this without a GPU I had problems. I think there is a simple fix. The simple transformer NER model defaults to enabling cuda. This PR allows the user to pass a dictionary of arguments specifically for the simpletransformers NER model. So you can now run the code on a CPU by initializing rpunct like so
Before this change, when running rpunct examples on the CPU the following error occurs:
ValueError Traceback (most recent call last)
/var/folders/hx/dhzhl_x51118fm5cd13vzh2h0000gn/T/ipykernel_10548/194907560.py in
1 from rpunct import RestorePuncts
2 # The default language is 'english'
----> 3 rpunct = RestorePuncts()
4 rpunct.punctuate("""in 2018 cornell researchers built a high-powered detector that in combination with an algorithm-driven process called ptychography set a world record
5 by tripling the resolution of a state-of-the-art electron microscope as successful as it was that approach had a weakness it only worked with ultrathin samples that were
~/repos/rpunct/rpunct/punctuate.py in init(self, wrds_per_pred, ner_args)
19 if ner_args is None:
20 ner_args = {}
---> 21 self.model = NERModel("bert", "felflare/bert-restore-punctuation", labels=self.valid_labels,
22 args={"silent": True, "max_seq_length": 512}, **ner_args)
23
~/repos/transformers/transformer-env/lib/python3.8/site-packages/simpletransformers/ner/ner_model.py in init(self, model_type, model_name, labels, args, use_cuda, cuda_device, onnx_execution_provider, **kwargs)
209 self.device = torch.device(f"cuda:{cuda_device}")
210 else:
--> 211 raise ValueError(
212 "'use_cuda' set to True when cuda is unavailable."
213 "Make sure CUDA is available or set use_cuda=False."
ValueError: 'use_cuda' set to True when cuda is unavailable.Make sure CUDA is available or set use_cuda=False.