Fix all gradient issue for Bit-Balanced Adaptive Learner, add quantization tools, analysis scripts with tensorboard integration #738

DonaldLucy · 2026-02-04T06:08:26Z

Train (example)

python train.py \
  --init_from scratch --dataset minipile --out_dir out/repro \
  --device cuda --dtype float16 \
  --n_layer 4 --n_head 4 --n_embd 256 \
  --block_size 256 --batch_size 32 --gradient_accumulation_steps 1 \
  --max_iters 30000 --eval_interval 200 --eval_iters 50 --log_interval 20 \
  --learning_rate 3e-4 \
  --loss_fn bit_balanced_cross_entropy --bit_loss_weight 2e-1 --bit_loss_normalize \
  --linear_variant_attn adaptive_bit_linear --linear_variant_mlp adaptive_bit_linear \
  --adaptive_linear_init_bits 6 --adaptive_linear_min_bits 2 --adaptive_linear_max_bits 8 \
  --adaptive_linear_activation_bits 8 --adaptive_linear_quantize_input \
  --tensorboard_log --tensorboard_log_dir runs \
  --never_save_checkpoint

TensorBoard:

tensorboard --logdir runs

Outputs

out/<run>/ training outputs.
Bit allocation logs:
If using utils/bit_allocation_logger.py: out/<run>/bit_alloc/bit_*.csv

Plotting

Bit allocation curves (layer-wise / type-wise):

python scripts/plot_bit_alloc.py --run_dir out/<run>

…ation tools, analysis scripts with tensorboard integration

gkielian

Generally looks great, have a few questions and minor edits -- was hoping too we can remove the LPE from the train_args.py section.

variations/linear_variations.py

train.py

train_args.py

gkielian · 2026-02-06T22:59:16Z

No worries, I think everything looks good -- just started the workflow tests and will merge once it completes : )

…

On Fri, Feb 6, 2026 at 12:59 PM DonaldLucy ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In train_args.py <#738 (comment)> : > + ## Learned Position Embeddings + model_group.add_argument( '--n_lpe', type=int, default=0, help='Number of LearnedPositionEmbedding modules to instantiate (one per transformer block)') + + model_group.add_argument('--lpe_block_size', default=256, type=int) + model_group.add_argument('--lpe_n_layer', default=3, type=int) + model_group.add_argument('--lpe_n_head', default=6, type=int) + model_group.add_argument('--lpe_n_kv_group', default=None, type=int) + model_group.add_argument('--lpe_use_abs_pos_embeddings', default=True, action=argparse.BooleanOptionalAction, help='Whether LPE modules add absolute position embeddings') + model_group.add_argument('--lpe_use_rotary_embeddings', default=True, action=argparse.BooleanOptionalAction, help='Whether LPE modules add absolute position embeddings') + model_group.add_argument('--lpe_n_qk_head_dim', default=None, type=int) + model_group.add_argument('--lpe_n_v_head_dim', default=None, type=int) + model_group.add_argument("--lpe_mlp_size", type=int, default=None, help="If not None, is used instead of mlp_expansion_factor") + + model_group.add_argument('--target_layer_in_lpe', default=0, type=int) + model_group.add_argument('--target_layer_out_lpe', default=0, type=int) + + model_group.add_argument( + "--lpe_attention_variant", + type=str, + default="causal", + choices=attention_variants, + help="Which attention variant to use for the Transformer blocks." + ) Didn't mean to, might be a version compatibility issue, it it removed — Reply to this email directly, view it on GitHub <#738 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACTR6GM7G6DPVXEEIQN2NA34KT6D3AVCNFSM6AAAAACT5HNXHKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTONRVGEYDKNZRGI> . You are receiving this because you commented.Message ID: ***@***.***>

-- Gregory Kielian | Google Research | ***@***.***

gkielian · 2026-02-09T18:06:22Z

@DonaldLucy Seems there might be a directory name change or additional file to add to the PR for the scripts to run (sharing screenshot):

Fix all gradient issue for Bit-Balanced Adaptive Learner, add quantiz…

0f80630

…ation tools, analysis scripts with tensorboard integration

gkielian requested changes Feb 5, 2026

View reviewed changes

Fix typos and revert logic for bias check

e54bbef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix all gradient issue for Bit-Balanced Adaptive Learner, add quantization tools, analysis scripts with tensorboard integration #738

Fix all gradient issue for Bit-Balanced Adaptive Learner, add quantization tools, analysis scripts with tensorboard integration #738

DonaldLucy commented Feb 4, 2026 •

edited by gkielian

Loading

Uh oh!

gkielian left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gkielian commented Feb 6, 2026 via email

Uh oh!

gkielian commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix all gradient issue for Bit-Balanced Adaptive Learner, add quantization tools, analysis scripts with tensorboard integration #738

Are you sure you want to change the base?

Fix all gradient issue for Bit-Balanced Adaptive Learner, add quantization tools, analysis scripts with tensorboard integration #738

Conversation

DonaldLucy commented Feb 4, 2026 • edited by gkielian Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Train (example)

TensorBoard:

Outputs

Plotting

Uh oh!

gkielian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gkielian commented Feb 6, 2026 via email

Uh oh!

gkielian commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DonaldLucy commented Feb 4, 2026 •

edited by gkielian

Loading