fixed crash on loading http://nlp.stanford.edu/data/wordvecs/glove.84…#61
fixed crash on loading http://nlp.stanford.edu/data/wordvecs/glove.84…#61ok-ok-ok-ok wants to merge 1 commit intomaciejkula:masterfrom
Conversation
|
May I suggest you post the message you are getting, as well as add a test for this? |
|
Closed and re-opened to trigger Circle build. |
|
I suspect the issue here is some Python unicode badness: the pretrained GloVe vectors contain two lines that Python thinks hold vectors for the same word: So as we read in the second line we successfully add a new entry to the "vectors" array but when adding it to the "dct" dict the second vector overwrites the first. This means that we try to reshape to the wrong size matrix (note that 2196016 * 300 is 300 less than 658805100: our missing entry in "dct" is causing the error): The change in this PR solves this for me. It is just throwing away the second vector though, so maybe a solution that distinguishes between those two different words would be better? That said, I don't think that making a distinction between those two bits of garbled nonsense is particularly important, or even meaningful. |
fixed crash on loading http://nlp.stanford.edu/data/wordvecs/glove.840B.300d.zip pretrained model