Skip to content

hieunch/PPRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Privacy-Preserving Deep Reinforcement Learning using Fully Homomorphic Encryption

Description

This repository contains the source code for the paper "Empowering AI with Privacy: Homomorphic Encryption for Secure Deep Reinforcement Learning". It demonstrates how Fully Homomorphic Encryption (FHE) can be integrated with Deep Reinforcement Learning (DRL) to ensure privacy-preserving computations.

Installation

1. Clone/Download the Repository

 cd PPRL/{environment}  # Replace {environment} with the desired environment, e.g., Pendulum-v1

2. Set Up a Virtual Environment

python -m venv my-venv
source my-venv/bin/activate  # On Windows: my-venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Enable Homomorphic Encryption (Optional)

A system with at least 64GB of RAM is recommended for HE computations.

Install OpenFHE and OpenFHE Python by following the installation guide in the OpenFHE repository.

Installation Time

On a standard desktop computer, installation typically takes 10-15 minutes, depending on system specifications and internet speed.

Usage

1. Running the DRL Algorithm

The main_v3.py script is used to run the DRL algorithm. Below are its available arguments:

  -h, --help            show this help message and exit
  --env-name ENV_NAME   Mujoco Gym environment (default: CartPole-v0)
  --policy POLICY       Policy Type: Gaussian | Deterministic (default: Gaussian)
  --eval EVAL           Evaluates a policy a policy every 10 episode (default: True)
  --gamma G             discount factor for reward (default: 0.99)
  --lr G                learning rate (default: 0.0003)
  --alpha G             Temperature parameter α determines the relative importance of the entropy term against the reward (default: 0.2)
  --automatic_entropy_tuning G
                        Automatically adjust α (default: False)
  --seed N              random seed (default: 123456)
  --batch_size N        batch size (default: 256)
  --num_steps N         maximum number of steps (default: 1000000)
  --hidden_size N       hidden size (default: 256)
  --updates_per_step N  model updates per simulator step (default: 1)
  --start_steps N       Steps sampling random actions (default: 10000)
  --replay_size N       size of replay buffer (default: 10000000)
  --cuda                run on CUDA (default: False)
  --offline             wandb mode offline
  --he                  run with HE compatible mode
  --encrypt             run in encrypted mode
  --run_name RUN_NAME   Run name, default: SAC

Run SAC-HE algorithm (HE-Compatible Mode)

python3 main_v3.py --alpha 1 --start_steps 1000 --hidden_size 32 --batch_size 64 \
--lr 0.001 --num_steps 20000 --run_name SAC-HE --offline --he

Run Vanilla SAC Algorithm

python3 main_v3.py --alpha 1 --start_steps 1000 --hidden_size 32 --batch_size 64 \
--lr 0.001 --num_steps 20000 --run_name SAC-vanilla --offline

2. Running in Fully Encrypted Mode

python3 main_v3.py --alpha 1 --start_steps 1000 --hidden_size 32 --batch_size 64 \
--lr 0.001 --num_steps 20000 --run_name SAC-HE --offline --he --encrypt

3. Running Stable-Baseline3 Benchmarks

python3 stable-baseline.py --alg {algorithm}  # Replace {algorithm} with SAC, PPO, etc.

Expected Runtime

On a standard desktop computer:

  • Prototype mode (without FHE encryption): 15-30 minutes per environment
  • FHE encryption mode: 6-12 minutes per update step

Example Output

Below is an example of the expected console output:

----------------------------------------
Episode: 398, Avg. Test Reward: -231.83
----------------------------------------
Episode: 398, total numsteps: 79600, episode steps: 200, updates: 79399, reward: -480.54
Episode: 399, total numsteps: 79800, episode steps: 200, updates: 79599, reward: -361.54
       +------------------------------------------------------------+
 -204.4¦                                           ?? ??¦ _¦_???????¦
       ¦                                           ¦¦¦ ¯?¦?¯??¦ ¦???¦
       ¦                                           ¦ ?   ?   ?¦     ¦
 -434.4¦                                           ¦                ¦
       ¦                                          ?¦                ¦
       ¦                                         ??                 ¦
 -664.3¦                                        ??                  ¦
       ¦                                       _¦                   ¦
       ¦                                    ??¦?¦                   ¦
 -894.2¦                                   ¦¦?¦                     ¦
       ¦     ?               ?  ?    ??  _?¯¯¦                      ¦
       ¦ ?¦? ¦ ?     ?  ¦  ??¦????¦???¦??¦                          ¦
-1124.2¦?¦?¦?¦ ??? ??¦  ¦ ¦¦¯?¦¦ ¦?¦?? ??¦                          ¦
       ¦¦¦¦¦¦¦¦¦¦¦?¦?¦?_¦¦¦?  ?? ??                                 ¦
       ¦¦????¦?¦¦¦?¦????¦¦                                          ¦
-1354.1¦¦  ??¦ ??      ¦¦?                                          ¦
       ¦¦               ?                                           ¦
       ¦¦                                                           ¦
-1584.1¦¦                                                           ¦
       +------------------------------------------------------------+
       0.0           49.8           99.5          149.2        199.0
Return                             Episode
----------------------------------------
Episode: 400, Avg. Test Reward: -324.27
----------------------------------------

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages