This repository contains the source code for the paper "Empowering AI with Privacy: Homomorphic Encryption for Secure Deep Reinforcement Learning". It demonstrates how Fully Homomorphic Encryption (FHE) can be integrated with Deep Reinforcement Learning (DRL) to ensure privacy-preserving computations.
cd PPRL/{environment} # Replace {environment} with the desired environment, e.g., Pendulum-v1python -m venv my-venv
source my-venv/bin/activate # On Windows: my-venv\Scripts\activatepip install -r requirements.txtA system with at least 64GB of RAM is recommended for HE computations.
Install OpenFHE and OpenFHE Python by following the installation guide in the OpenFHE repository.
On a standard desktop computer, installation typically takes 10-15 minutes, depending on system specifications and internet speed.
The main_v3.py script is used to run the DRL algorithm. Below are its available arguments:
-h, --help show this help message and exit
--env-name ENV_NAME Mujoco Gym environment (default: CartPole-v0)
--policy POLICY Policy Type: Gaussian | Deterministic (default: Gaussian)
--eval EVAL Evaluates a policy a policy every 10 episode (default: True)
--gamma G discount factor for reward (default: 0.99)
--lr G learning rate (default: 0.0003)
--alpha G Temperature parameter α determines the relative importance of the entropy term against the reward (default: 0.2)
--automatic_entropy_tuning G
Automatically adjust α (default: False)
--seed N random seed (default: 123456)
--batch_size N batch size (default: 256)
--num_steps N maximum number of steps (default: 1000000)
--hidden_size N hidden size (default: 256)
--updates_per_step N model updates per simulator step (default: 1)
--start_steps N Steps sampling random actions (default: 10000)
--replay_size N size of replay buffer (default: 10000000)
--cuda run on CUDA (default: False)
--offline wandb mode offline
--he run with HE compatible mode
--encrypt run in encrypted mode
--run_name RUN_NAME Run name, default: SACpython3 main_v3.py --alpha 1 --start_steps 1000 --hidden_size 32 --batch_size 64 \
--lr 0.001 --num_steps 20000 --run_name SAC-HE --offline --hepython3 main_v3.py --alpha 1 --start_steps 1000 --hidden_size 32 --batch_size 64 \
--lr 0.001 --num_steps 20000 --run_name SAC-vanilla --offlinepython3 main_v3.py --alpha 1 --start_steps 1000 --hidden_size 32 --batch_size 64 \
--lr 0.001 --num_steps 20000 --run_name SAC-HE --offline --he --encryptpython3 stable-baseline.py --alg {algorithm} # Replace {algorithm} with SAC, PPO, etc.On a standard desktop computer:
- Prototype mode (without FHE encryption): 15-30 minutes per environment
- FHE encryption mode: 6-12 minutes per update step
Below is an example of the expected console output:
----------------------------------------
Episode: 398, Avg. Test Reward: -231.83
----------------------------------------
Episode: 398, total numsteps: 79600, episode steps: 200, updates: 79399, reward: -480.54
Episode: 399, total numsteps: 79800, episode steps: 200, updates: 79599, reward: -361.54
+------------------------------------------------------------+
-204.4¦ ?? ??¦ _¦_???????¦
¦ ¦¦¦ ¯?¦?¯??¦ ¦???¦
¦ ¦ ? ? ?¦ ¦
-434.4¦ ¦ ¦
¦ ?¦ ¦
¦ ?? ¦
-664.3¦ ?? ¦
¦ _¦ ¦
¦ ??¦?¦ ¦
-894.2¦ ¦¦?¦ ¦
¦ ? ? ? ?? _?¯¯¦ ¦
¦ ?¦? ¦ ? ? ¦ ??¦????¦???¦??¦ ¦
-1124.2¦?¦?¦?¦ ??? ??¦ ¦ ¦¦¯?¦¦ ¦?¦?? ??¦ ¦
¦¦¦¦¦¦¦¦¦¦¦?¦?¦?_¦¦¦? ?? ?? ¦
¦¦????¦?¦¦¦?¦????¦¦ ¦
-1354.1¦¦ ??¦ ?? ¦¦? ¦
¦¦ ? ¦
¦¦ ¦
-1584.1¦¦ ¦
+------------------------------------------------------------+
0.0 49.8 99.5 149.2 199.0
Return Episode
----------------------------------------
Episode: 400, Avg. Test Reward: -324.27
----------------------------------------