Skip to content

Lab-LVM/PEFT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reference

The most of the codes are borrowed from PEFT docs. Also, Bitsandbytes docs describe the basic information. I just change hyper-parameters such as batch size, etc.

Tutorial

  1. login to huggingface cli
pip install -U "huggingface_hub[cli]"
huggingface-cli login # you need to generate token (just follow CMD prompts)
  1. install required library
pip install -U bitsandbytes accelerate transformers peft trl 

This is the library that I used.

accelerate-1.6.0 
datasets-3.5.0 
peft-0.15.2 
pyarrow-19.0.1 
requests-2.32.3 
tokenizers-0.21.1 
transformers-4.51.3 
trl-0.17.0

In case you use conda, please run conda env create -f environment.yml -n peft

  1. run script. please change the cuda device and model parameter size.
for param in 7 13; do bash script/single.sh 0, $param; done
for param in 7 13; do bash script/ddp_qlora.sh 0,1 $param; done
for param in 7 13 30 65; do bash script/fsdp_qlora.sh 0,1 $param; done
  1. summarize train latency like the below examples.
  • 7b: 10 sec
  • 13b: 20 sec
  • 33b: 30 sec
  • 65b: 60 sec

FSPD+DDP

  1. uncomment L#171 of train.py.
  2. run ./ddp_fsdp_qlora.sh 0,1,2,3 7

Note:

  • It will run two main process on GPU group 1 (0,1) and GPU group 2 (2,3).
  • It will train LoRA adapter with the frozen quantized model. You can see the script/ folder and easily change the backbone from quantized to FP16.
  • Current DDP+FSDP implementation is not perfect. The logger and saving checkpoints will be performed multiple times.

About

GPU benchmark purpose repository using PEFT library

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published