Skip to content

Reproducing Llama3.2 1B results #11

@KevinZhoutianyi

Description

@KevinZhoutianyi

Hi

Thanks for your great work and repo.

I am trying to reproduce your table 2 result on llama3.2 1B with COCONUT.

Image

And I am following your config in appendix:

Image

However, my accuracy for coconut is only 28%. Could you help confirm whether my following config is correct?
with 8 gpus:
c_thought: 2 epochs_per_stage: 3 max_latent_stage: 5 pad_latent_to_max: True seed: 0 resume: 3 bf16: True batch_size_training: 32 gradient_accumulation_steps: 1 num_epochs: 25 lr: !!float "1e-4" weight_decay: 0.01

I saw in #7 that the max_latent_stage is only 1? Doesn't that mean there will always be 2 latent tokens at all?

Thanks for your time and amazing work again

Best
Tianyi

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions