SafeNudge

A Python library with the implementation for the algorithms used in "Safeguarding large language models in real-time with tunable safety-performance trade-offs", by J. Fonseca, A. Bell and J. Stoyanovich.

CTG provides methods to guide model responses based on various criteria, helping ensure safe, high-quality, and controllable text generation.

Implemented methods

Controlled Text Generation (CTG): The SafeNudge implementation.
WildGuard Integration (WildguardCTG): SafeNudge using the WildGuard classifier
Token Masking (TokenMaskingCTG): c-FUDGE, as described in the paper

Installation

A Python distribution of version >= 3.12 is required to run this project. Earlier Python versions might work in most cases, but they were never tested.

From Source

# Clone the repository
git clone https://github.com/joaopfonseca/SafeNudge.git
cd Output-Steering

# Install in development mode
pip install -e .

Using pip

pip install git+https://github.com/joaopfonseca/SafeNudge.git

Examples

Check the notebooks directory for some examples Andrew and I developed while working on SafeNudge and setting up the experiments!

Project Structure

Output-Steering/
├── ctg/                    # Core library code
├── experiments/            # Experimental code and evaluation
└── notebooks/              # Jupyter notebooks with examples

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this code in your research, please cite:

@article{fonseca2025safeguarding,
  title={Safeguarding large language models in real-time with tunable safety-performance trade-offs},
  author={Fonseca, Joao and Bell, Andrew and Stoyanovich, Julia},
  journal={arXiv preprint arXiv:2501.02018},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.github/workflows		.github/workflows
ctg		ctg
experiments		experiments
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SafeNudge

Implemented methods

Installation

From Source

Using pip

Examples

Project Structure

License

Citation

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

License

joaopfonseca/SafeNudge

Folders and files

Latest commit

History

Repository files navigation

SafeNudge

Implemented methods

Installation

From Source

Using pip

Examples

Project Structure

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Packages