Skip to content

Conversation

@xinyixuu
Copy link
Collaborator

Add demos scripts and corresponding yaml files for multitokenization training, fix the issue in run_experiments.py for running multidataset training

@xinyixuu xinyixuu requested a review from gkielian September 17, 2025 00:21
gkielian and others added 27 commits September 19, 2025 08:56
Merge branch 'multidataset' of https://github.com/xinyixuu/nanoGPT into multidataset
Updated dataset sampling probabilities and added new named groups for various datasets. Adjusted configurations for batch iterations and tensorboard run names.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants