hyperparameters for pre-training

Hi, this is a nice work!

Could you give some more details about the hyperparameters used in pre-training?

ZEN (P) is trained based on Google BERT. How many epochs used in the additional pre-training?

Thanks!