Interpretable Reinforcement Learning for Insulin Dosage

This repository contains the code and experiments for our project: Explainable Recurrent Proximal Policy Optimization (RPPO) for determining proper insulin dosage in Type 1 Diabetes patients using the SimGlucose environment.

We explore how interpretability and temporal modeling can enhance clinical reliability in reinforcement learning (RL) systems. Specifically, we compare:

A feedforward PPO baseline
A Recurrent PPO with LSTM
An Attention-augmented Recurrent PPO model that provides feature-level interpretability

🚀 Project Highlights

Domain: Medical decision-making (Type 1 Diabetes)
Environment: SimGlucose (based on the FDA-approved UVA/Padova simulator)
Goal: Maintain blood glucose in the target range of 70–180 mg/dL through autonomous insulin dosing
Models: PPO, Recurrent PPO, Attention Recurrent PPO
Interpretability: Feature-level attention to highlight influence of CGM, meal intake, insulin history, and time

🏗️ Repository Structure

  Interpretable_RL
  ├── code  # Training scripts for all 3 models and evaluation script
  │ ├── Explainable_RPPO_models.py
  │ └── Final_Test_and_Demonstration.py
  ├── model_weights/ # Pretrained model weights for PPO, Recurrent PPO, Attention PPO
  ├── Explainable_RPPO.pdf # Final report with architecture, methodology, and results
  └── README.md

📊 Key Results

Recurrent models (with or without attention) significantly outperformed the PPO baseline in glucose control and reduced dangerous episodes.
Feature-level attention aligned with clinical expectations (e.g., higher weight on meals post-intake).
Quantitative results averaged over 10 simulated patients:

Model	Glucose in Range	Dangerous Episodes
PPO (Baseline)	25.7% ± 5.9%	6
Recurrent PPO	63.8% ± 9.7%	2
Attention RPPO	61.4% ± 9.4%	3

🧠 Architecture Overview

PPO Baseline: 3-layer feedforward actor-critic
Recurrent PPO: Two stacked LSTM layers followed by dense layers
Attention RPPO: MLP-based feature-level attention → LSTM → Policy/Value heads

Each model is trained with a reward function based on the risk index from Kovatchev et al. (2006), encouraging stable and safe glucose levels.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Code		Code
Model_Weights		Model_Weights
Explainable_RPPO.pdf		Explainable_RPPO.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Interpretable Reinforcement Learning for Insulin Dosage

🚀 Project Highlights

🏗️ Repository Structure

📊 Key Results

🧠 Architecture Overview

About

Uh oh!

Releases

Packages

Languages

Pyoussefpour/Interpretable_RL

Folders and files

Latest commit

History

Repository files navigation

Interpretable Reinforcement Learning for Insulin Dosage

🚀 Project Highlights

🏗️ Repository Structure

📊 Key Results

🧠 Architecture Overview

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages