diff --git a/docs/user-guide/index.rst b/docs/user-guide/index.rst index bf80fb618..681659ddb 100644 --- a/docs/user-guide/index.rst +++ b/docs/user-guide/index.rst @@ -5,6 +5,7 @@ .. toctree:: :maxdepth: 2 + reinforce.rst sft.rst knowledge-distillation.rst dpo.rst @@ -19,6 +20,9 @@ :ref:`Prerequisite Obtaining a Pre-Trained Model ` This section provides instructions on how to download pre-trained LLMs in .nemo format. The following section will use these base LLMs for further fine-tuning and alignment. +:ref:`Model Alignment by REINFORCE ` + In this tutorial, we will guide you through the process of aligning a NeMo Framework model using REINFORCE. This method can be applied to various models, including LLaMa2 and Mistral, with our scripts functioning consistently across different models. + :ref:`Model Alignment by Supervised Fine-Tuning (SFT) ` In this section, we walk you through the most straightforward alignment method. We use a supervised dataset in the prompt-response pairs format to fine-tune the base model according to the desired behavior. @@ -59,6 +63,14 @@ - Mistral - Nemotron-4 - Mixtral + * - :ref:`REINFORCE ` + - Yes + - Yes + - Yes + - Yes (✓) + - Yes + - Yes + - * - :ref:`SFT ` - - Yes (✓) diff --git a/docs/user-guide/reinforce.rst b/docs/user-guide/reinforce.rst index 1e1651668..6d7897281 100644 --- a/docs/user-guide/reinforce.rst +++ b/docs/user-guide/reinforce.rst @@ -1,6 +1,6 @@ .. include:: /content/nemo.rsts -.. _model-aligner-reinforce: +.. _nemo-aligner-reinforce: Model Alignment by REINFORCE @@@@@@@@@@@@@@@@@@@@@@@@@@@@