Hi~
1、Is ZEN trained from any base bert(e.g. google) or trained from scratch? If from scrach, I guess the n-gram emb is randomly initialized, If from base bert, the n-gram emb maybe the average of characters included?
2、According to "We use the same parameter setting for the n-gram encoder as in BERT" in the paper,I want to know that the params of n-gram encoder is shared and the same with bert tower(maybe the bottom six layer?),or is initialized and trained independently?
thank you~