Skip to content

Conversation

@777yifei
Copy link

@777yifei 777yifei commented Oct 5, 2023

Train_with_pruning.py: Train.py with pruning. change pruning_p to change pruning rate.
tinystories/prepare_char.py: prepare the data of tinystorise @ char level. Chage number to Chage data_url
tinystories/prepare_tiktoken.py: prepare tinystories with tiktoken (from OpenAI). Chage number to Chage data_url

number = 0
# download the tiny shakespeare dataset
input_file_path = os.path.join(os.path.dirname(__file__), 'input.txt')
if not os.path.exists(input_file_path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a great idea to allow for flexibility of the dataset from the python level.

Check out the means for loading datasets using the huggingface 'datasets' module:

https://huggingface.co/docs/datasets/upload_dataset

For now, hoping to isolate our changes from the shakespeare/prepare.py.

Let's create a new folder for now, and shelve the discussion for directory names.

Proposing we cp -r the shakespeare_char directory to a new folder called "experiments", and modify the prepare.py there with an argparse parameter for the model name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sharing an example of how to download a dataset with the datasets library:

from datasets import load_dataset
from pathlib import Path

dataset = load_dataset("msaligane/tinystories_phonology", split="train")

data_dir = Path("data")
data_dir.mkdir(exist_ok=True)

full_text = ""
for i, example in enumerate(dataset):
  filename = f"tinystoryP{i:02d}.txt"
  filepath = data_dir / filename
  
  with open(filepath, "w") as f:
    f.write(example["text"])
  
  full_text += example["text"] + "\n"

with open(data_dir / "full.txt", "w") as f:
  f.write(full_text)

train.py Outdated
$ torchrun --nproc_per_node=8 --nnodes=2 --node_rank=1 --master_addr=123.456.123.456 --master_port=1234 train.py
(If your cluster does not have Infiniband interconnect prepend NCCL_IB_DISABLE=1)

This version has pruning feature from 290-302. You can change pruning_p to change the pruning rate.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the pruning code into it's own file called pruning.py living in the same directory as train.py, and have the pruning be an argparse parameter into train.py.

Generally let's make it so that modifications to train.py are selectable via argparse flags or configuration files.

train.py Outdated
# Create a mask where we keep weights with gradients larger than the threshold
#mask = torch.abs(model.fc.weight.grad) > threshold
#model.fc.weight.data.mul_(mask)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for these commented lines, feel free to move these to the pruning.py script as well perhaps as pydoc comments on top of the method as demonstrated via the following page:

https://www.datacamp.com/tutorial/docstrings-python

train.py Outdated
min_lr = 6e-5 # minimum learning rate, should be ~= learning_rate/10 per Chinchilla
# pruning
prune = 0
pruning_p = 500 #pruning once after 500 times interations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the interim, also turn these into argparse params, with the default being without pruning for now.

In this way, we can still include this in search space later.

Again, pruning itself should be a argparse flag for now set to store_true, so that if the flag is listed then pruning will occur with this method.

See https://docs.python.org/3/library/argparse.html for more details on store_true

10.12
@DingHengyu
Copy link

Has changed

@777yifei
Copy link
Author

chages has been done

@gkielian
Copy link
Collaborator

After running the code, came across an error related to the configurator.py file, seems we'll have to remove the dependency on configurator.py before we add additional arguments.

Created an issue for this, and will attempt to finish a pull request tomorrow with the changes (and sooner the better):
#20

Afterwards adding options for pruning, tensorboard options, etc. will be unblocked.

@gkielian
Copy link
Collaborator

gkielian commented Oct 27, 2023

Update, as mentioned the last comment, the configurator.py file doesn't get along with argparse at all, creating #21 that refactors the original procedural train.py into a Trainer class and handles cli configuration settings via argparse.

Now it is much much easier to add features.

Feel free to add the pruning features now within the parse_args function, and the corresponding sections of our newly parameterized train.py

gkielian and others added 20 commits November 3, 2023 16:41
Adding a github actions test just for basic preparing, training, and inference.

This template incorporates:

1. installation of dependencies
2. caching for pip dependencies
3. preparation of data into test and validation sets
4. training (with cpu)
5. inference (with cpu)

Lastly this test runs whenever there is a pull request or a push onto
the any branch of the repo.
Logs will have labels for timestamps, project name, and run_name.

Set these with train.py argparse inputs.
This has a constant denominator for preventing overflow.
Add a parameterized estimation of softmax which should be easier to
implement in hardware.

This has three sections:

1. flat section (y=0)
2. linear section  (y=mx+b)
3. polynomial section (y=x**p + b), where 'p' is the polynomial power (e.g. 2)

To prevent overflow, we divide as done in the 'constantmax' variation by
a number, in this case by default 1000.0.

Preliminary experiments on smaller networks shows this converges for
power 2 and divisor of 1000.0, without having to implement an equivalent
for x_max.
This is softermax with parameter to increase the base to higher numbers.
Before, to add a new flag into model variation, one had to
1. ensure GPTConfig has an option and type is set (use_feature: bool = True)
2. add feature to the argparse in train.py
3. add feature to dictionary manually in the train.py model_config

Now these steps are handled by organization of argparse.

If it is a model configuration, add to the model_group.
One still need to make sure that the option exists in the GPTConfig, but
that should be fine (one place to expose the variable, one way to
modify).

(Before, one wouldn't get any message if one hadn't manually added it to
the model_config dictionary, which also felt very manual.)

As a bonus, this also allows us an easy fix to print each of the model
settings, training settings, and logging settings before training! : D
Change the means of taking in arguments to argparse.

Thought about setting dropout via argparse, but decided simply to set
dropout=0.0 on 'resume' from sample.py as a default behavior.
Utilized new syntax for booleans, which allow for setting positive or
negative values on the cli via `no` prefixing:

`--compile` - compiles (True)
`--no-compile` - does not compile (False)

This also allows for setting the default value independently from
presence of the boolean flag.
Modify sample.py to utilize argparse for handling input options
@gkielian
Copy link
Collaborator

gkielian commented Nov 9, 2023

@777yifei Thanks for your patience! : D

The repo is finally ready for additional train.py parameters to be added in.

Let's work together to bring these changes into the repo, I'll probaby give a shot tonight, and will share questions tomorrow.

@777yifei 777yifei closed this Jan 23, 2024
gkielian pushed a commit that referenced this pull request May 11, 2024
Add new dependencies to requirements_cpu.txt
gkielian pushed a commit that referenced this pull request Aug 6, 2024
Merge master to add_snac_tokens
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants