RL101: Reinforcement Learning 101

A pragmatic, hands-on course covering the transition of Large Language Models from passive sequence generators into autonomous decision-making agents. Based on "The Landscape of Agentic Reinforcement Learning for LLMs: A Survey" (arXiv:2509.02547), this course bridges theory to implementation with runnable code (Use with Google Colab or Jupyter Notebooks), practical examples, and industry-standard security practices.

Agentic-RL.mp4

Key Takeaways

Paradigm Shift: From single-step preference-based reinforcement fine-tuning (PBRFT) to multi-step agent training (Agentic RL)
Technical Foundation: Partially observable Markov Decision Process (POMDP) formalism enables planning, tool use, memory, and self-improvement
Practical Focus: Every concept includes runnable implementation and evaluation benchmarks
Security-First: Secure-by-design patterns throughout

Quick Start

Prerequisites Check

# Check Python environment
python --version  # Requires Python 3.8+
pip --version     # Package management

# Install core dependencies
pip install torch transformers gymnasium numpy matplotlib

# Verify GPU availability (optional but recommended)
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

# Quick environment test
python -c "import gym; print('Environment setup complete!')"

5-Minute Demo: Your First Agentic RL Agent

import gymnasium as gym
import torch
import numpy as np
from transformers import AutoTokenizer, AutoModel

class SimpleAgenticAgent:
    """Minimal agentic RL agent demonstrating core concepts"""
    def __init__(self, model_name="distilbert-base-uncased"):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModel.from_pretrained(model_name)
        self.memory = []  # Simple episodic memory
        
    def act(self, observation, tools_available=None):
        """Core agentic decision: text generation + tool selection"""
        # Simple planning: consider observation + memory
        context = f"Observation: {observation}\nMemory: {self.memory[-3:]}"
        
        # Tokenize with input validation (security)
        if len(context) > 512:
            context = context[-512:]  # Truncate safely
            
        inputs = self.tokenizer(context, return_tensors="pt", truncation=True)
        
        # Generate response (simplified)
        with torch.no_grad():
            outputs = self.model(**inputs)
            
        # Action selection: [text_response, tool_call, confidence]
        action = {
            'text': "Based on observation, I should...",
            'tool': tools_available[0] if tools_available else None,
            'confidence': 0.8
        }
        
        # Update memory (learning)
        self.memory.append({'obs': observation, 'action': action})
        return action

# Demo usage
agent = SimpleAgenticAgent()
result = agent.act("User asks: What's 2+2?", tools_available=['calculator'])
print(f"Agent decision: {result}")

Learning Architecture

Foundation (Weeks 1-4)          Implementation (Weeks 5-8)
┌─────────────────────┐        ┌─────────────────────┐
│  MDP/POMDP Theory  │───────►│   RAG Systems      │
│  Context Assembly  │        │   Memory Agents    │
│  Reward Design     │        │   Tool Integration │
│  Algorithm Basics  │        │   Multi-Agent      │
└─────────────────────┘        └─────────────────────┘
         │                              │
         ▼                              ▼
┌─────────────────────┐        ┌─────────────────────┐
│ Capability Training │        │  Frontier Research  │
│ Planning, Memory    │◄───────┤  Scaling Challenges │
│ Tool Use, Reasoning │        │  Safety & Trust     │
│ Self-Improvement    │        │  Future Directions  │
└─────────────────────┘        └─────────────────────┘

Course Modules

Part I: Mathematical Foundations (Weeks 1-4)

1. Introduction

Paradigm shift from LLM-RL to Agentic RL
Survey overview and research landscape

2. From LLM RL to Agentic RL

Part II: Agentic Capabilities (Weeks 5-6)

3. Model Capability Perspective

Part III: Task Applications (Weeks 7-8)

4. Task Perspective

Part IV: Systems & Future (Weeks 9-12)

5. Environment and Frameworks

6. Open Challenges and Future Directions

7. Conclusion

Synthesis and next steps

Learning Objectives

By completion, you will:

Formalize agentic RL using MDP/POMDP mathematics
Implement core capabilities: planning, memory, tool use, reasoning
Build practical agents for code, math, GUI, and search tasks
Evaluate using industry-standard benchmarks and environments
Deploy secure, scalable agentic systems in production

Resources

Primary References

Survey Paper: The Landscape of Agentic Reinforcement Learning for LLMs
Paper Collection: Awesome AgenticLLM-RL Papers
Institutions: University of Oxford, Shanghai AI Laboratory, National University of Singapore

Development Tools

Core Libraries: torch, transformers, gymnasium, numpy
Evaluation: Standard benchmarks (SWE-Bench, GAIA, WebArena)
Security: Input validation, sandbox execution, permission systems

Contributing

See CONTRIBUTING.md for development guidelines, security requirements, and submission processes.

License

MIT License - Open source, industry-standard.

This course bridges 500+ research papers into practical, secure, production-ready implementations. Start with the Quick Start demo above, then proceed to Module 1

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
1_Introduction		1_Introduction
2_Preliminaries_From_LLM_RL_to_Agentic_RL		2_Preliminaries_From_LLM_RL_to_Agentic_RL
3_Agentic_RL_Capability_Perspective		3_Agentic_RL_Capability_Perspective
4_Agentic_RL_Task_Perspective		4_Agentic_RL_Task_Perspective
LICENSE		LICENSE
README.md		README.md
RL101_Outline.md		RL101_Outline.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL101: Reinforcement Learning 101

Key Takeaways

Quick Start

Prerequisites Check

5-Minute Demo: Your First Agentic RL Agent

Learning Architecture

Course Modules

Part I: Mathematical Foundations (Weeks 1-4)

1. Introduction

2. From LLM RL to Agentic RL

Part II: Agentic Capabilities (Weeks 5-6)

3. Model Capability Perspective

Part III: Task Applications (Weeks 7-8)

4. Task Perspective

Part IV: Systems & Future (Weeks 9-12)

5. Environment and Frameworks

6. Open Challenges and Future Directions

7. Conclusion

Learning Objectives

Resources

Primary References

Development Tools

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

davidkimai/RL101

Folders and files

Latest commit

History

Repository files navigation

RL101: Reinforcement Learning 101

Key Takeaways

Quick Start

Prerequisites Check

5-Minute Demo: Your First Agentic RL Agent

Learning Architecture

Course Modules

Part I: Mathematical Foundations (Weeks 1-4)

Part II: Agentic Capabilities (Weeks 5-6)

Part III: Task Applications (Weeks 7-8)

Part IV: Systems & Future (Weeks 9-12)

Learning Objectives

Resources

Primary References

Development Tools

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages