Mikesterner87

Mikesterner87

Popular repositories Loading

Nano-R1 Nano-R1 Public

This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.

Jupyter Notebook 2