Conversation
|
@LeonGuertler this is a beautful example of how a PR should be --> list of changes |
| logits = logits.view(-1, logits.size(-1)) | ||
| y = y.view(-1) | ||
| loss = torch.nn.functional.cross_entropy(logits, y, reduction="none") | ||
| loss = torch.nn.functional.cross_entropy(logits, y, reduction="none", ignore_index=-1) |
There was a problem hiding this comment.
we are 100% sure this has no impacts elsewhere? should be okay but..
There was a problem hiding this comment.
Yes. Ignore_index's default value is -100. So, changing it from -100 to -1 would only impact existing label (y) assignments of -1 or -100.
- I searched our code base for the assignment of -100, there was none.
- I searched our code base for the assignment of -1. Apart from the new
MLMDataLoaderusing label[~mask]=-1, other assignments were used in arguments such as 'dim=-1', or just as index values.
| embedder: | ||
| tokenizer_type: gpt2 | ||
| embedding_model_type: generic | ||
| dataset_name: simple_en_wiki |
There was a problem hiding this comment.
change this to stlm
There was a problem hiding this comment.
sorry, can I clarify what should be changed to stlm?
There was a problem hiding this comment.
the dataset_name, but its okay, this will all be reworked later
|
I propose closing this, as we decided not to do MLM for now... feel free to rework this if you like, should be pretty simple to update, and happy to merge in since it doesn't add much code |

MLMDataloader- that masks inputs (80% masked, 10% randomised, 10% untouched),MLMDataloaderin the dataloader dictionary.