This repository implements RLDP, a novel approach to differentially private fine-tuning of causal language models with LoRA adapters, where a small RL hyper-policy (Soft Actor-Critic) learns to adapt per-layer clipping thresholds and noise multipliers to maximize model utility under a privacy budget.
Paper: “Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning” (arXiv:2507.22565)
- LoRA adapters: Efficient low-rank updates to large pre-trained language models
- DP-SGD: Differentially private optimization with per-adapter clipping & Gaussian noise
- SAC hyper-policy: Soft Actor-Critic learns to schedule clipping thresholds & noise multipliers
- WandB integration: Track training loss, perplexity, privacy spent, SAC metrics, and more
Install via pip:
pip install torch \
transformers peft bitsandbytes \
opacus wandb tqdm pandas numpyYour training and evaluation data must be CSV files with a column named text. Each row’s text will be tokenized and split into blocks of up to --block_size tokens.
Example train.csv:
id,text
1,The quick brown fox jumps over the lazy dog.
2,Privacy-preserving ML is crucial in medical domains.
...Login to WandB (for experiment tracking)
wandb loginpython train.py \
--model meta-llama/Llama-3.2-1B \
--train_csv /path/to/train.csv \
--eval_csv /path/to/eval.csv \
--output_dir ./rldp_out \
--block_size 128 \
--epochs 3 \
--lr 5e-4 \
--epsilon 0.5 \
--delta 1e-5 \
--rl_interval 112 \
--sac_batch_size 4 \
--sac_updates_per_interval 2 \
--run_name rldp_experiment1Key arguments
--model: HuggingFace ID of the base causal LM
--train_csv, --eval_csv: paths to your CSV files
--output_dir: directory to save fine-tuned model, tokenizer, and SAC policy
DP settings:
--epsilon, --delta: target privacy budget (ε, δ)
SAC hyper-policy:
--rl_interval: number of training steps between SAC actions
--sac_*: buffer size, batch size, learning rates, soft-update rate, etc.
Metrics are logged to Weights & Biases (WandB) by default To view your run dashboard:
wandb dashboardLogged metrics include training/eval loss, perplexity, ε spent, average clip, noise multiplier, SAC reward, actor/critic losses, and more 📦 Outputs
After training, --output_dir will contain:
Base model + LoRA adapters (HuggingFace format)
Tokenizer files
sac_hyperpolicy.pt: checkpoint with
SAC actor & critic state dicts
State normalization statistics
You can reload these artifacts to resume training, evaluate, or deploy.
We use the Gaussian Differential Privacy (GDP) accountant to compose Gaussian mechanisms over training steps. The GaussianAccountant in train.py tracks noise multipliers and sample rates, and computes ε spent at each step.
If you use RLDP in your work, please cite:
@misc{khadangi2025efficientdifferentiallyprivatefinetuning,
title={Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning},
author={Afshin Khadangi and Amir Sartipi and Igor Tchappi and Ramin Bahmani and Gilbert Fridgen},
year={2025},
eprint={2507.22565},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2507.22565},
}This code builds on: