Reproducing Llama3.2 1B results

Hi

Thanks for your great work and repo.

I am trying to reproduce your table 2 result on llama3.2 1B with COCONUT.

<img width="371" height="277" alt="Image" src="https://github.com/user-attachments/assets/ae6692dd-1e6c-44ed-a2ad-b8e3915f3aa5" />

And I am following your config in appendix: 

<img width="1262" height="257" alt="Image" src="https://github.com/user-attachments/assets/519c5371-737d-4a58-8ecf-ff4cdc13c4b2" />

However, my accuracy for coconut is only 28%. Could you help confirm whether my following config is correct?
with 8 gpus: 
`c_thought: 2
epochs_per_stage: 3
max_latent_stage: 5
pad_latent_to_max: True
seed: 0
resume: 3
bf16: True
batch_size_training: 32
gradient_accumulation_steps: 1
num_epochs: 25
lr: !!float "1e-4"
weight_decay: 0.01`


I saw in #7 that the max_latent_stage is only 1? Doesn't that mean there will always be 2 latent tokens at all?

Thanks for your time and amazing work again

Best
Tianyi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing Llama3.2 1B results #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reproducing Llama3.2 1B results #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions