-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Hi
Thanks for your great work and repo.
I am trying to reproduce your table 2 result on llama3.2 1B with COCONUT.
And I am following your config in appendix:
However, my accuracy for coconut is only 28%. Could you help confirm whether my following config is correct?
with 8 gpus:
c_thought: 2 epochs_per_stage: 3 max_latent_stage: 5 pad_latent_to_max: True seed: 0 resume: 3 bf16: True batch_size_training: 32 gradient_accumulation_steps: 1 num_epochs: 25 lr: !!float "1e-4" weight_decay: 0.01
I saw in #7 that the max_latent_stage is only 1? Doesn't that mean there will always be 2 latent tokens at all?
Thanks for your time and amazing work again
Best
Tianyi
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels