Skip to content

Conversation

@lu-wang-dl
Copy link
Collaborator

@lu-wang-dl lu-wang-dl commented Oct 13, 2023

@lu-wang-dl lu-wang-dl requested a review from es94129 October 13, 2023 05:40
Copy link
Contributor

@es94129 es94129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this, looks very cool!
Wondering why deepspeed is required, is it for the memory optimization?

# MAGIC
# MAGIC # Fine tune llama-2-70b with deepspeed
# MAGIC
# MAGIC [Llama 2](https://huggingface.co/meta-llama) is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. It is trained with 2T tokens and supports context length window upto 4K tokens. [Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf) is the 7B pretrained model, converted for the Hugging Face Transformers format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
# MAGIC [Llama 2](https://huggingface.co/meta-llama) is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. It is trained with 2T tokens and supports context length window upto 4K tokens. [Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf) is the 7B pretrained model, converted for the Hugging Face Transformers format.
# MAGIC [Llama 2](https://huggingface.co/meta-llama) is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. It is trained with 2T tokens and supports context length window upto 4K tokens. [Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf) is the 70B pretrained model, converted for the Hugging Face Transformers format.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deepspeed is used for multi-GPU training with LORA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since 07 is used for AI gateway, maybe other indices

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Let's design a proper orders after.


# MAGIC %sh
# MAGIC deepspeed \
# MAGIC --num_gpus 2 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--num_gpus is probably not needed because deepspeed can use all the GPUs on the machine

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Let me remove it.

MODEL_PATH = 'meta-llama/Llama-2-70b-hf'
TOKENIZER_PATH = 'meta-llama/Llama-2-70b-hf'
DEFAULT_TRAINING_DATASET = "mosaicml/dolly_hhrlhf"
CONFIG_PATH = "../../config/a10_config_zero2.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename the file to a100_...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename the file to a100_...?

# MAGIC deepspeed \
# MAGIC --num_gpus 2 \
# MAGIC scripts/fine_tune_lora.py \
# MAGIC --output_dir="/local_disk0/output"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: What is the difference between --output_dir and /local_disk0/final_model, is the latter just the LoRA weights?

# COMMAND ----------

# MAGIC %sh
# MAGIC ls /local_disk0/final_model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add instructions or code for how to load this for inference?

lu-wang-dl and others added 2 commits October 19, 2023 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants