Train_with_pruning.py tinystories/prepare_char.py tinystories/prepare_tiktoken.py #2

777yifei · 2023-10-05T19:36:34Z

Train_with_pruning.py: Train.py with pruning. change pruning_p to change pruning rate.
tinystories/prepare_char.py: prepare the data of tinystorise @ char level. Chage number to Chage data_url
tinystories/prepare_tiktoken.py: prepare tinystories with tiktoken (from OpenAI). Chage number to Chage data_url

11

gkielian · 2023-10-09T19:51:26Z

data/shakespeare/prepare.py

+number = 0
 # download the tiny shakespeare dataset
 input_file_path = os.path.join(os.path.dirname(__file__), 'input.txt')
 if not os.path.exists(input_file_path):


It is a great idea to allow for flexibility of the dataset from the python level.

Check out the means for loading datasets using the huggingface 'datasets' module:

https://huggingface.co/docs/datasets/upload_dataset

For now, hoping to isolate our changes from the shakespeare/prepare.py.

Let's create a new folder for now, and shelve the discussion for directory names.

Proposing we cp -r the shakespeare_char directory to a new folder called "experiments", and modify the prepare.py there with an argparse parameter for the model name.

Sharing an example of how to download a dataset with the datasets library:

from datasets import load_dataset from pathlib import Path dataset = load_dataset("msaligane/tinystories_phonology", split="train") data_dir = Path("data") data_dir.mkdir(exist_ok=True) full_text = "" for i, example in enumerate(dataset): filename = f"tinystoryP{i:02d}.txt" filepath = data_dir / filename with open(filepath, "w") as f: f.write(example["text"]) full_text += example["text"] + "\n" with open(data_dir / "full.txt", "w") as f: f.write(full_text)

gkielian · 2023-10-09T20:22:58Z

train.py

 $ torchrun --nproc_per_node=8 --nnodes=2 --node_rank=1 --master_addr=123.456.123.456 --master_port=1234 train.py
 (If your cluster does not have Infiniband interconnect prepend NCCL_IB_DISABLE=1)
+
+This version has pruning feature from 290-302. You can change pruning_p to change the pruning rate.


Let's move the pruning code into it's own file called pruning.py living in the same directory as train.py, and have the pruning be an argparse parameter into train.py.

Generally let's make it so that modifications to train.py are selectable via argparse flags or configuration files.

gkielian · 2023-10-09T20:24:03Z

train.py

+    # Create a mask where we keep weights with gradients larger than the threshold
+    #mask = torch.abs(model.fc.weight.grad) > threshold
+    #model.fc.weight.data.mul_(mask)
+


for these commented lines, feel free to move these to the pruning.py script as well perhaps as pydoc comments on top of the method as demonstrated via the following page:

https://www.datacamp.com/tutorial/docstrings-python

gkielian · 2023-10-09T20:28:23Z

train.py

 min_lr = 6e-5 # minimum learning rate, should be ~= learning_rate/10 per Chinchilla
+# pruning 
+prune = 0
+pruning_p = 500 #pruning once after 500 times interations


For the interim, also turn these into argparse params, with the default being without pruning for now.

In this way, we can still include this in search space later.

Again, pruning itself should be a argparse flag for now set to store_true, so that if the flag is listed then pruning will occur with this method.

See https://docs.python.org/3/library/argparse.html for more details on store_true

10.12

DingHengyu · 2023-10-12T22:13:44Z

Has changed

777yifei · 2023-10-23T03:06:07Z

chages has been done

gkielian · 2023-10-24T05:39:29Z

After running the code, came across an error related to the configurator.py file, seems we'll have to remove the dependency on configurator.py before we add additional arguments.

Created an issue for this, and will attempt to finish a pull request tomorrow with the changes (and sooner the better):
#20

Afterwards adding options for pruning, tensorboard options, etc. will be unblocked.

gkielian · 2023-10-27T00:36:15Z

Update, as mentioned the last comment, the configurator.py file doesn't get along with argparse at all, creating #21 that refactors the original procedural train.py into a Trainer class and handles cli configuration settings via argparse.

Now it is much much easier to add features.

Feel free to add the pruning features now within the parse_args function, and the corresponding sections of our newly parameterized train.py

Adding a github actions test just for basic preparing, training, and inference. This template incorporates: 1. installation of dependencies 2. caching for pip dependencies 3. preparation of data into test and validation sets 4. training (with cpu) 5. inference (with cpu) Lastly this test runs whenever there is a pull request or a push onto the any branch of the repo.

Add top level gitignore

Logs will have labels for timestamps, project name, and run_name. Set these with train.py argparse inputs.

This has a constant denominator for preventing overflow.

Add GitHub actions

Add constantmax option in model.py

Add tensorboard labels to the log file

Add csv preparation scripts

Add a parameterized estimation of softmax which should be easier to implement in hardware. This has three sections: 1. flat section (y=0) 2. linear section (y=mx+b) 3. polynomial section (y=x**p + b), where 'p' is the polynomial power (e.g. 2) To prevent overflow, we divide as done in the 'constantmax' variation by a number, in this case by default 1000.0. Preliminary experiments on smaller networks shows this converges for power 2 and divisor of 1000.0, without having to implement an equivalent for x_max.

This is softermax with parameter to increase the base to higher numbers.

Before, to add a new flag into model variation, one had to 1. ensure GPTConfig has an option and type is set (use_feature: bool = True) 2. add feature to the argparse in train.py 3. add feature to dictionary manually in the train.py model_config Now these steps are handled by organization of argparse. If it is a model configuration, add to the model_group. One still need to make sure that the option exists in the GPTConfig, but that should be fine (one place to expose the variable, one way to modify). (Before, one wouldn't get any message if one hadn't manually added it to the model_config dictionary, which also felt very manual.) As a bonus, this also allows us an easy fix to print each of the model settings, training settings, and logging settings before training! : D

Change the means of taking in arguments to argparse. Thought about setting dropout via argparse, but decided simply to set dropout=0.0 on 'resume' from sample.py as a default behavior.

Utilized new syntax for booleans, which allow for setting positive or negative values on the cli via `no` prefixing: `--compile` - compiles (True) `--no-compile` - does not compile (False) This also allows for setting the default value independently from presence of the boolean flag.

Modify sample.py to utilize argparse for handling input options

gkielian · 2023-11-09T02:45:12Z

@777yifei Thanks for your patience! : D

The repo is finally ready for additional train.py parameters to be added in.

Let's work together to bring these changes into the repo, I'll probaby give a shot tonight, and will share questions tomorrow.

pow2 added

Add new dependencies to requirements_cpu.txt

Merge master to add_snac_tokens

777yifei added 2 commits October 5, 2023 15:30

Initial commit

f313302

11

3cb8943

11

msaligane requested review from gkielian and shiweiGPT October 5, 2023 20:10

gkielian requested changes Oct 9, 2023

View reviewed changes

new push

3d5edd4

10.12

gkielian and others added 20 commits November 3, 2023 16:41

Add .gitignore to top level

f8e71bc

Merge pull request ReaLLMASIC#25 from gkielian/gitignore_updates

a0ff0c6

Add top level gitignore

Add process for toy dataset

19b9117

Add readme and main.sh

e573a18

Add .gitignore for csv_data folder

014cb07

Add instructions for usage

e58ac5f

Add project and run_name labels to tensorboard

6a1d2e1

Logs will have labels for timestamps, project name, and run_name. Set these with train.py argparse inputs.

Add constantmax option in model.py

960e973

This has a constant denominator for preventing overflow.

Merge pull request ReaLLMASIC#26 from gkielian/add_github_actions

005475c

Add GitHub actions

Merge pull request ReaLLMASIC#30 from shiweiGPT/constant_max

43611a4

Add constantmax option in model.py

Merge pull request ReaLLMASIC#29 from klei22/add_tensorboard_labels

f5ead2e

Add tensorboard labels to the log file

Merge pull request ReaLLMASIC#28 from klei22/csv_preparation

f08d2e0

Add csv preparation scripts

Add train.py argparse args for softermax variation

4c8fddb

Add strongermax

566a4e9

This is softermax with parameter to increase the base to higher numbers.

Modify sample.py to utilize argparse

94a4fcd

Change the means of taking in arguments to argparse. Thought about setting dropout via argparse, but decided simply to set dropout=0.0 on 'resume' from sample.py as a default behavior.

Merge pull request ReaLLMASIC#34 from gkielian/samplepy_argparse

8446e09

Modify sample.py to utilize argparse for handling input options

777yifei added 2 commits November 10, 2023 23:48

pow2 added

b00cf14

pow2 added

Merge branch 'master' of https://github.com/777yifei/nanoGPT

77e54d6

777yifei closed this Jan 23, 2024

gkielian pushed a commit that referenced this pull request May 11, 2024

Merge pull request #2 from gkielian/add_plotting_deps

637aae3

Add new dependencies to requirements_cpu.txt

gkielian pushed a commit that referenced this pull request Aug 6, 2024

Merge pull request #2 from xinyixuu/master

bd9fc88

Merge master to add_snac_tokens

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train_with_pruning.py tinystories/prepare_char.py tinystories/prepare_tiktoken.py #2

Train_with_pruning.py tinystories/prepare_char.py tinystories/prepare_tiktoken.py #2

Uh oh!

777yifei commented Oct 5, 2023

Uh oh!

gkielian Oct 9, 2023

Uh oh!

gkielian Oct 10, 2023

Uh oh!

gkielian Oct 9, 2023

Uh oh!

gkielian Oct 9, 2023

Uh oh!

gkielian Oct 9, 2023

Uh oh!

DingHengyu commented Oct 12, 2023

Uh oh!

777yifei commented Oct 23, 2023

Uh oh!

gkielian commented Oct 24, 2023

Uh oh!

gkielian commented Oct 27, 2023 •

edited

Loading

Uh oh!

gkielian commented Nov 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Train_with_pruning.py tinystories/prepare_char.py tinystories/prepare_tiktoken.py #2

Train_with_pruning.py tinystories/prepare_char.py tinystories/prepare_tiktoken.py #2

Uh oh!

Conversation

777yifei commented Oct 5, 2023

Uh oh!

gkielian Oct 9, 2023

Choose a reason for hiding this comment

Uh oh!

gkielian Oct 10, 2023

Choose a reason for hiding this comment

Uh oh!

gkielian Oct 9, 2023

Choose a reason for hiding this comment

Uh oh!

gkielian Oct 9, 2023

Choose a reason for hiding this comment

Uh oh!

gkielian Oct 9, 2023

Choose a reason for hiding this comment

Uh oh!

DingHengyu commented Oct 12, 2023

Uh oh!

777yifei commented Oct 23, 2023

Uh oh!

gkielian commented Oct 24, 2023

Uh oh!

gkielian commented Oct 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gkielian commented Nov 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gkielian commented Oct 27, 2023 •

edited

Loading