-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
Hello, thank for your open source. I am trying to understand your code. However, in the data.py, it is confused for me to preprocess the data.
In building vocabulary,
print("Load corpus with train size %d, valid size %d, "
"test size %d raw vocab size %d vocab size %d at cut_off %d OOV rate %f"
% (len(self.train_corpus), len(self.valid_corpus), len(self.test_corpus),
raw_vocab_size, len(vocab_count), vocab_count[-1][1], float(discard_wc) / len(all_words)))What do the train size, valid size, and test size mean?
The values of all are 2 since they are a tuple with length of 2.
Do you mean that all vocabularies are from the training, testing, and validation data?
However, it only uses the training data to build the vocabulary in the code.
In formatting dialogue,
Is it essential to add [<s>,<d>,</s>] in the start of the dialogue?
Can I not use this?
thank you.
Metadata
Metadata
Assignees
Labels
No labels