Skip to content

Pyoussefpour/Interpretable_RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Interpretable Reinforcement Learning for Insulin Dosage

This repository contains the code and experiments for our project: Explainable Recurrent Proximal Policy Optimization (RPPO) for determining proper insulin dosage in Type 1 Diabetes patients using the SimGlucose environment.

We explore how interpretability and temporal modeling can enhance clinical reliability in reinforcement learning (RL) systems. Specifically, we compare:

  • A feedforward PPO baseline
  • A Recurrent PPO with LSTM
  • An Attention-augmented Recurrent PPO model that provides feature-level interpretability

πŸš€ Project Highlights

  • Domain: Medical decision-making (Type 1 Diabetes)
  • Environment: SimGlucose (based on the FDA-approved UVA/Padova simulator)
  • Goal: Maintain blood glucose in the target range of 70–180 mg/dL through autonomous insulin dosing
  • Models: PPO, Recurrent PPO, Attention Recurrent PPO
  • Interpretability: Feature-level attention to highlight influence of CGM, meal intake, insulin history, and time

πŸ—οΈ Repository Structure

  Interpretable_RL
  β”œβ”€β”€ code  # Training scripts for all 3 models and evaluation script
  β”‚ β”œβ”€β”€ Explainable_RPPO_models.py
  β”‚ └── Final_Test_and_Demonstration.py
  β”œβ”€β”€ model_weights/ # Pretrained model weights for PPO, Recurrent PPO, Attention PPO
  β”œβ”€β”€ Explainable_RPPO.pdf # Final report with architecture, methodology, and results
  └── README.md

πŸ“Š Key Results

  • Recurrent models (with or without attention) significantly outperformed the PPO baseline in glucose control and reduced dangerous episodes.
  • Feature-level attention aligned with clinical expectations (e.g., higher weight on meals post-intake).
  • Quantitative results averaged over 10 simulated patients:
Model Glucose in Range Dangerous Episodes
PPO (Baseline) 25.7% Β± 5.9% 6
Recurrent PPO 63.8% Β± 9.7% 2
Attention RPPO 61.4% Β± 9.4% 3

🧠 Architecture Overview

  • PPO Baseline: 3-layer feedforward actor-critic
  • Recurrent PPO: Two stacked LSTM layers followed by dense layers
  • Attention RPPO: MLP-based feature-level attention β†’ LSTM β†’ Policy/Value heads

Each model is trained with a reward function based on the risk index from Kovatchev et al. (2006), encouraging stable and safe glucose levels.

About

Interpretable Recurrent PPO to determine insulin dosage

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published