π A curated collection of training recipes for RLite πβ¨, designed to provide comprehensive guidance on implementing reinforcement learning (RL) techniques for large language models (LLMs) and vision-language models (VLMs). This repository offers modular configurations βοΈ, training workflows π, and best practices π to help users reproduce state-of-the-art RL-driven results π in alignment, fine-tuning, and optimization tasks.
π± While not exhaustive, the current recipes focus on foundational and emerging methodologies, serving as a starting point for adapting RLite to custom projects. Contributions are welcome π to expand coverage of existing research and benchmarks.
π§ Under active development β we aim to expand support collaboratively while maintaining reproducibility πβ and clarity β¨π. New recipes will be added progressively as the ecosystem evolves!
"Cooking up RL innovations, one recipe at a time!" π©π³π₯
Developer's guide.
We use pre-commit and git cz to sanitize the commits. You can run pre-commit before git cz to avoid repeatedly input the commit messages.
pip install pre-commit
# Install pre-commit hooks
pre-commit install
pre-commit install --hook-type commit-msg
# Install this emoji-style tool
sudo npm install -g git-cz --no-audit --verbose --registry=https://registry.npmmirror.com
# Install rlite and development dependencies
pip install -e ".[dev]"- Single line code length is 99 characters, comments and documents are 79 characters.
- Write unit tests for atomic capabilities to ensure that
pytestdoes not throw an error.
Run pre-commit to automatically lint the code:
pre-commit run --all-files