-
Notifications
You must be signed in to change notification settings - Fork 28
Train_with_pruning.py tinystories/prepare_char.py tinystories/prepare_tiktoken.py #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| number = 0 | ||
| # download the tiny shakespeare dataset | ||
| input_file_path = os.path.join(os.path.dirname(__file__), 'input.txt') | ||
| if not os.path.exists(input_file_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a great idea to allow for flexibility of the dataset from the python level.
Check out the means for loading datasets using the huggingface 'datasets' module:
https://huggingface.co/docs/datasets/upload_dataset
For now, hoping to isolate our changes from the shakespeare/prepare.py.
Let's create a new folder for now, and shelve the discussion for directory names.
Proposing we cp -r the shakespeare_char directory to a new folder called "experiments", and modify the prepare.py there with an argparse parameter for the model name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sharing an example of how to download a dataset with the datasets library:
from datasets import load_dataset
from pathlib import Path
dataset = load_dataset("msaligane/tinystories_phonology", split="train")
data_dir = Path("data")
data_dir.mkdir(exist_ok=True)
full_text = ""
for i, example in enumerate(dataset):
filename = f"tinystoryP{i:02d}.txt"
filepath = data_dir / filename
with open(filepath, "w") as f:
f.write(example["text"])
full_text += example["text"] + "\n"
with open(data_dir / "full.txt", "w") as f:
f.write(full_text)
train.py
Outdated
| $ torchrun --nproc_per_node=8 --nnodes=2 --node_rank=1 --master_addr=123.456.123.456 --master_port=1234 train.py | ||
| (If your cluster does not have Infiniband interconnect prepend NCCL_IB_DISABLE=1) | ||
|
|
||
| This version has pruning feature from 290-302. You can change pruning_p to change the pruning rate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move the pruning code into it's own file called pruning.py living in the same directory as train.py, and have the pruning be an argparse parameter into train.py.
Generally let's make it so that modifications to train.py are selectable via argparse flags or configuration files.
train.py
Outdated
| # Create a mask where we keep weights with gradients larger than the threshold | ||
| #mask = torch.abs(model.fc.weight.grad) > threshold | ||
| #model.fc.weight.data.mul_(mask) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for these commented lines, feel free to move these to the pruning.py script as well perhaps as pydoc comments on top of the method as demonstrated via the following page:
train.py
Outdated
| min_lr = 6e-5 # minimum learning rate, should be ~= learning_rate/10 per Chinchilla | ||
| # pruning | ||
| prune = 0 | ||
| pruning_p = 500 #pruning once after 500 times interations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the interim, also turn these into argparse params, with the default being without pruning for now.
In this way, we can still include this in search space later.
Again, pruning itself should be a argparse flag for now set to store_true, so that if the flag is listed then pruning will occur with this method.
See https://docs.python.org/3/library/argparse.html for more details on store_true
|
Has changed |
|
chages has been done |
|
After running the code, came across an error related to the configurator.py file, seems we'll have to remove the dependency on configurator.py before we add additional arguments. Created an issue for this, and will attempt to finish a pull request tomorrow with the changes (and sooner the better): Afterwards adding options for pruning, tensorboard options, etc. will be unblocked. |
|
Update, as mentioned the last comment, the Now it is much much easier to add features. Feel free to add the pruning features now within the |
Adding a github actions test just for basic preparing, training, and inference. This template incorporates: 1. installation of dependencies 2. caching for pip dependencies 3. preparation of data into test and validation sets 4. training (with cpu) 5. inference (with cpu) Lastly this test runs whenever there is a pull request or a push onto the any branch of the repo.
Add top level gitignore
Logs will have labels for timestamps, project name, and run_name. Set these with train.py argparse inputs.
This has a constant denominator for preventing overflow.
Add GitHub actions
Add constantmax option in model.py
Add tensorboard labels to the log file
Add csv preparation scripts
Add a parameterized estimation of softmax which should be easier to implement in hardware. This has three sections: 1. flat section (y=0) 2. linear section (y=mx+b) 3. polynomial section (y=x**p + b), where 'p' is the polynomial power (e.g. 2) To prevent overflow, we divide as done in the 'constantmax' variation by a number, in this case by default 1000.0. Preliminary experiments on smaller networks shows this converges for power 2 and divisor of 1000.0, without having to implement an equivalent for x_max.
This is softermax with parameter to increase the base to higher numbers.
Before, to add a new flag into model variation, one had to 1. ensure GPTConfig has an option and type is set (use_feature: bool = True) 2. add feature to the argparse in train.py 3. add feature to dictionary manually in the train.py model_config Now these steps are handled by organization of argparse. If it is a model configuration, add to the model_group. One still need to make sure that the option exists in the GPTConfig, but that should be fine (one place to expose the variable, one way to modify). (Before, one wouldn't get any message if one hadn't manually added it to the model_config dictionary, which also felt very manual.) As a bonus, this also allows us an easy fix to print each of the model settings, training settings, and logging settings before training! : D
Change the means of taking in arguments to argparse. Thought about setting dropout via argparse, but decided simply to set dropout=0.0 on 'resume' from sample.py as a default behavior.
Utilized new syntax for booleans, which allow for setting positive or negative values on the cli via `no` prefixing: `--compile` - compiles (True) `--no-compile` - does not compile (False) This also allows for setting the default value independently from presence of the boolean flag.
Modify sample.py to utilize argparse for handling input options
|
@777yifei Thanks for your patience! : D The repo is finally ready for additional train.py parameters to be added in. Let's work together to bring these changes into the repo, I'll probaby give a shot tonight, and will share questions tomorrow. |
Add new dependencies to requirements_cpu.txt
Merge master to add_snac_tokens
Train_with_pruning.py: Train.py with pruning. change pruning_p to change pruning rate.
tinystories/prepare_char.py: prepare the data of tinystorise @ char level. Chage number to Chage data_url
tinystories/prepare_tiktoken.py: prepare tinystories with tiktoken (from OpenAI). Chage number to Chage data_url