-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Hi, I’m reproducing the experiments from the paper. I used the model Llama-3.1-8B-Instruct to reproduce Dataless L&S and L&S. According to the paper, the two methods should achieve similar scores:
However, in my reproduction, L&S performs significantly worse than Dataless L&S:
I used the evaluation script provided in the repository. The L&S merging command is as follows:
python ./merging/main.py --algo LocalizeAndStitch \
--base-model /share/home/wenqingchen/zmj/keyan_merge/models/Meta-Llama-3.1-8B-Instruct \
--lr 1e8 \
--sparsity 0.1 \
--n_epochs 1
When checking the model outputs on IFEval, I found that the outputs of L&S often contain repeated text segments:
I’m not sure what went wrong. Could you please help me identify the issue?Metadata
Metadata
Assignees
Labels
No labels