Crescendo Attack Evaluation

An experimental framework for evaluating Crescendo attacks. This project uses PyRIT's CrescendoAttack to execute multi-turn attacks and analyze the results.

Overview

The Crescendo attack is a strategy that gradually guides a model to generate harmful content. This project provides tools to systematically evaluate Crescendo attacks and store the results.

Reference

This project implements and evaluates the Crescendo attack proposed in the following paper:

Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack

Key Features

Execute and evaluate Crescendo attacks
Support for OpenAI and HuggingFace models
Batch processing of multiple tasks
Detailed logging for each turn (original/converted prompts, responses)
Automatic result saving and summarization (incremental save for each task)
API key management via .env file

Installation

1. Clone Repository

git clone <repository-url>
cd crescendo

2. Install Dependencies

pip install -r requirements.txt

3. Configure Environment Variables

Create a .env file and set your API keys:

# OpenAI API (when using OpenAI models)
OPENAI_API_KEY=your-openai-api-key

# HuggingFace (when using HuggingFace models)
HUGGINGFACE_TOKEN=your-huggingface-token

Usage

Basic Execution

python crescendo_eval.py

Model Selection

# Set all models to gpt-4o
python crescendo_eval.py \
  --target-model gpt-4o \
  --attacker-model gpt-4o \
  --judge-model gpt-4o \
  --refusal-judge-model gpt-4o

Adjust Experiment Parameters

python crescendo_eval.py \
  --max-turns 5 \
  --max-backtracks 3 \
  --objective-threshold 0.8

Use Custom Task File

python crescendo_eval.py --tasks-file data/your_tasks.json

Control Prompt Converter

By default, the emoji converter (EmojiConverter) is enabled. To disable the converter:

python crescendo_eval.py --no-converter

Utility Commands

# List available models
python crescendo_eval.py --list-models

# Check API key status
python crescendo_eval.py --check-keys

Command Line Options

Model Settings

--target-model: Target model to attack (default: gpt-4o-mini)
--attacker-model: Model for generating attack prompts (default: gpt-4o-mini)
--judge-model: Model for judging objective achievement (default: gpt-4o-mini)
--refusal-judge-model: Model for detecting refusals (default: gpt-4o-mini)

Experiment Parameters

--max-turns: Maximum number of turns (default: 10)
--max-backtracks: Maximum number of backtracks (default: 10)
--objective-threshold: Objective achievement threshold (default: 0.8)
--no-converter: Disable prompt converter (default: converter enabled)

File Settings

--tasks-file: Path to task JSON file (default: data/custom_tasks.json)
--output-file: Result filename (default: auto-generated)

Task File Format

Tasks are defined in JSON files:

[
  {
    "name": "Task Name",
    "objective": "Attack objective description"
  }
]

Supported Models

The following models are currently supported:

OpenAI: gpt-4, gpt-4o, gpt-4o-mini, gpt-3.5-turbo
HuggingFace: llama-2-7b, llama-2-70b, llama-3-8b, llama-3-70b

All available models can be checked using the --list-models option.

To add a new model, refer to the comments in the models/model_configs.py file.

Results

Result Files

Experiment results are saved in JSON format in the outputs/results/ directory. Results are automatically saved each time a task completes, so even if the program is interrupted, all results up to that point are preserved.

Each result file contains the following information:

Task name and objective
Success status
Number of turns used
Execution time
Final turn summary (Attacker Prompt, Target Response)
Turn history (turn_history): Detailed information for all turns
- Original prompt for each turn (attacker_original)
- Converted prompt for each turn (attacker_converted, emoji conversion, etc.)
- Target response for each turn (target_response)
- Judge score for each turn (judge_score): A float value between 0.0 and 1.0 indicating the degree of objective achievement for that turn (based on threshold 0.8)
- Backtracked prompt-response pairs for each turn (backtracked_pairs): A list of prompt-response pairs that were rejected and backtracked in that turn
  - Each backtracked pair includes attacker_original, attacker_converted, target_response, and refusal_judge (boolean) information
- Backtrack count for each turn (backtrack_count): The number of backtracked_pairs
Experiment settings

Logging Files

Detailed logging for each task is saved separately in the outputs/logs/ directory:

File name format: turn_logs_{task_name}_{timestamp}.json
Detailed information for each turn (original/converted prompts, responses)
Only actual turns are recorded, excluding backtracks

Project Structure

crescendo/
├── crescendo_eval.py          # Main execution script
├── src/
│   ├── orchestrator.py        # CrescendoExperiment class
│   └── utils.py               # Utility functions (result saving, logging)
├── models/
│   ├── model_configs.py       # Model configurations
│   └── model_factory.py       # Model factory
├── data/
│   └── custom_tasks.json      # Task definitions
├── outputs/
│   ├── results/               # Result storage directory
│   └── logs/                  # Turn logging storage directory
├── requirements.txt           # Dependency list
└── README.md                  # This file

Requirements

Python 3.11+
PyRIT framework
OpenAI API key (when using OpenAI models)
HuggingFace token (when using HuggingFace models)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crescendo Attack Evaluation

Overview

Reference

Key Features

Installation

1. Clone Repository

2. Install Dependencies

3. Configure Environment Variables

Usage

Basic Execution

Model Selection

Adjust Experiment Parameters

Use Custom Task File

Control Prompt Converter

Utility Commands

Command Line Options

Model Settings

Experiment Parameters

File Settings

Task File Format

Supported Models

Results

Result Files

Logging Files

Project Structure

Requirements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
data		data
models		models
src		src
.gitignore		.gitignore
README.md		README.md
crescendo_eval.py		crescendo_eval.py
requirements.txt		requirements.txt

hoyaboi/Crescendo

Folders and files

Latest commit

History

Repository files navigation

Crescendo Attack Evaluation

Overview

Reference

Key Features

Installation

1. Clone Repository

2. Install Dependencies

3. Configure Environment Variables

Usage

Basic Execution

Model Selection

Adjust Experiment Parameters

Use Custom Task File

Control Prompt Converter

Utility Commands

Command Line Options

Model Settings

Experiment Parameters

File Settings

Task File Format

Supported Models

Results

Result Files

Logging Files

Project Structure

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages