Skip to content

Conversation

@DonaldLucy
Copy link

@DonaldLucy DonaldLucy commented Feb 4, 2026

Train (example)

python train.py \
  --init_from scratch --dataset minipile --out_dir out/repro \
  --device cuda --dtype float16 \
  --n_layer 4 --n_head 4 --n_embd 256 \
  --block_size 256 --batch_size 32 --gradient_accumulation_steps 1 \
  --max_iters 30000 --eval_interval 200 --eval_iters 50 --log_interval 20 \
  --learning_rate 3e-4 \
  --loss_fn bit_balanced_cross_entropy --bit_loss_weight 2e-1 --bit_loss_normalize \
  --linear_variant_attn adaptive_bit_linear --linear_variant_mlp adaptive_bit_linear \
  --adaptive_linear_init_bits 6 --adaptive_linear_min_bits 2 --adaptive_linear_max_bits 8 \
  --adaptive_linear_activation_bits 8 --adaptive_linear_quantize_input \
  --tensorboard_log --tensorboard_log_dir runs \
  --never_save_checkpoint

TensorBoard:

tensorboard --logdir runs

Outputs

  • out/<run>/ training outputs.
  • Bit allocation logs:
    If using utils/bit_allocation_logger.py: out/<run>/bit_alloc/bit_*.csv

Plotting

Bit allocation curves (layer-wise / type-wise):

python scripts/plot_bit_alloc.py --run_dir out/<run>

…ation tools, analysis scripts with tensorboard integration
Copy link
Collaborator

@gkielian gkielian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks great, have a few questions and minor edits -- was hoping too we can remove the LPE from the train_args.py section.

@gkielian
Copy link
Collaborator

gkielian commented Feb 6, 2026 via email

@gkielian
Copy link
Collaborator

gkielian commented Feb 9, 2026

@DonaldLucy Seems there might be a directory name change or additional file to add to the PR for the scripts to run (sharing screenshot):
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants