-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Hi, I am having a problem while trying to replicate the pretraining process of the model. I am running on a Ubuntu 18.04.5 LTS (GNU/Linux 5.9.11-3-MANJARO x86_64) machine with one GeForce RTX3090 GPU (NVIDIA-SMI 455.45.01, Driver Version: 455.45.01, CUDA Version: 11.1).
After running ./scripts/pretrain/preprocess-pretrain-all.sh to process the provided data in your repo under the data-src/pretrain_all, and running ./scripts/pretrain/pretrain-all.sh, I got an error UnboundLocalError: local variable 'num_updates' referenced before assignment. This happened after multiple overflow detected, setting loss scale to: XX messages. The full log is given in the txt file below.
Does this mean I need to tweak the parameters in the scripts/pretrain/pretrain-all.sh to get it running? Or do I need to use some data other than those provided in the data-src/pretrain_all to run the model?
I am a novice to this whole thing, so please allow me to apologize in advance if this was not a good question. Thank you!