The paper discusses the selection of layers 4-14 of Self attention in the copying. I can't seem to find where in the code it is actually done. It appears to em that the 16 layers are being copied?
Also, is it possible to share the code for the Ablation in Figure 10 in the paper?
Many thanks.