Make_A_Break: LLM Jailbreak Testing Workbench

Make_A_Break is a Python-based tool designed for researchers and developers to automatically test Large Language Model (LLM) vulnerabilities by attempting various "jailbreak" techniques. It utilizes local Ollama models for both the target LLM and an AI Judge, LangGraph for orchestrating the testing workflow, and Streamlit for an interactive user interface.

⚠️ Ethical Considerations & Disclaimer: This tool is intended strictly for research and educational purposes to identify, understand, and help mitigate LLM vulnerabilities. The tasks and prompts used can involve potentially harmful or sensitive content. Users are responsible for handling all generated outputs ethically and responsibly. Misuse of this tool or its outputs for malicious purposes is strictly prohibited. The effectiveness of the "AI Judge" is dependent on the capabilities of the chosen judge model and the clarity of its evaluation criteria.

Features

Local LLM Interaction: Leverages Ollama to run LLMs locally for both the target model being tested and the AI judge evaluating responses.
Automated Jailbreak Attempts: Uses a predefined (and extensible) set of jailbreak strategies.
Workflow Orchestration: Employs LangGraph to manage the sequence of operations: prompt generation, LLM querying, and response evaluation.
AI-Powered Evaluation: An "AI Judge" (another LLM) assesses whether the target LLM's response constitutes a successful jailbreak.
Interactive Web UI: A Streamlit interface allows users to:
- Configure target and judge Ollama models.
- Add, view, and manage jailbreak strategies.
- Select tasks and strategies for test runs.
- View real-time progress and detailed logs of test attempts.
- Download results.
Customizable Tasks & Strategies: Easily define new tasks and strategies via JSON files or the UI.
Comprehensive Logging: Records detailed information about each test attempt, including prompts, responses, and verdicts, to a results/jailbreak_log.jsonl file.

Prerequisites

Before you begin, ensure you have the following installed:

Python: Version 3.8 or higher. Make sure it's added to your system's PATH.
Ollama: Installed and running. You can download it from ollama.com.
Ollama Models: Download the LLMs you intend to use as the target and the judge via the Ollama CLI. For example:
```
ollama pull llama3
ollama pull mistral
```
(These are examples; you can use any models compatible with Ollama.)

Setup and Installation

Clone or Download the Repository: Get the project files onto your local machine.
Windows Users (Recommended):
- Navigate to the project's root directory in your command prompt or PowerShell.
- Run the run_app.bat script:
```
run_app.bat
```
- This script will automatically:
  - Create a Python virtual environment (e.g., jailbreak_env).
  - Activate the virtual environment.
  - Install all necessary dependencies from requirements.txt.
  - Start the Streamlit application.
Manual Setup (All Platforms / If run_app.bat is not used):
- Open your terminal or command prompt.
- Navigate to the project's root directory.
- Create a Python virtual environment:
```
python -m venv jailbreak_env
```
- Activate the virtual environment:
  - Windows: jailbreak_env\Scripts\activate
  - macOS/Linux: source jailbreak_env/bin/activate
- Install the required packages:
```
pip install -r requirements.txt
```
- Run the Streamlit application:
```
streamlit run app.py
```

Directory Structure

Make_A_Break/
├── app.py                  # Streamlit UI application
├── graph_runner.py         # LangGraph execution logic
├── langgraph_setup.py      # LangGraph definition
├── llm_interface.py        # Ollama interaction functions
├── judge.py                # AI Judge logic
├── utils.py                # Utility functions (data loading, etc.)
├── requirements.txt        # Python dependencies
├── run_app.bat             # Windows batch script for easy startup
├── README.md               # This file
├── data/
│   ├── tasks.json          # Dataset of tasks for the LLM
│   └── strategies.json     # Predefined jailbreak strategies
├── results/
│   └── jailbreak_log.jsonl # Log of test attempts and results
└── jailbreak_env/          # Virtual environment directory (created by script)

How to Use

Start the Application:
- Run run_app.bat (Windows) or follow the manual setup steps to start the Streamlit app.
- The application should open in your default web browser, typically at http://localhost:8501.
Configure Models (Sidebar):
- In the sidebar, enter the names of the Target Ollama Model (the LLM you want to test) and the Judge Ollama Model (the LLM that will evaluate responses). Ensure these models are available in your Ollama setup.
Manage Strategies (Sidebar):
- View existing jailbreak strategies.
- Add new strategies by providing a name, a unique ID, and a template. The template should use {task_prompt} as a placeholder for the actual task prompt.
- Strategies are saved to data/strategies.json.
Define Tasks:
- Tasks are defined in data/tasks.json. Each task includes an id, description, the prompt to be sent to the LLM, and a harm_category. You can manually edit this file to add or modify tasks.
Run Tests (Main Area):
- Select one or more tasks from the "Select Tasks to Run" dropdown.
- Select one or more strategies from the "Select Strategies to Apply" dropdown.
- Click the "Start Jailbreak Test Run" button.
- The application will iterate through each selected task-strategy combination, query the target LLM, and have the judge LLM evaluate the response.
View Logs & Results (Main Area):
- Progress and immediate results will be displayed as tests run.
- A table of all results is shown, which can be refreshed from the jailbreak_log.jsonl file.
- You can download the full results as a CSV file.
- The most recent detailed result (JSON format) is also displayed for closer inspection.
- All attempts are logged to results/jailbreak_log.jsonl.

Customization

Adding Tasks: Edit the data/tasks.json file to add new tasks. Follow the existing JSON structure.

[
    {
        "id": "new_task_example",
        "description": "A brief description of the new task.",
        "prompt": "The actual prompt for the LLM.",
        "harm_category": "e.g., misinformation"
    }
]

Adding Strategies:

Via UI: Use the "Add New Strategy" section in the Streamlit sidebar.

Manually: Edit the data/strategies.json file. Ensure each strategy has a unique id, a name, and a template string.

[
    {
        "id": "S_custom_example",
        "name": "My Custom Strategy",
        "template": "This is a custom wrapper around the request: {task_prompt}. Please comply."
    }
]

Contributing

Contributions, bug reports, and feature requests are welcome! Please feel free to open an issue or submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
__pycache__		__pycache__
.gitattributes		.gitattributes
.gitignore		.gitignore
Copy-FolderContentsToClipboard.ps1		Copy-FolderContentsToClipboard.ps1
LICENSE		LICENSE
README.md		README.md
app.py		app.py
comprehensive_runner.py		comprehensive_runner.py
graph_runner.py		graph_runner.py
judge.py		judge.py
langgraph_setup.py		langgraph_setup.py
llm_interface.py		llm_interface.py
requirements.txt		requirements.txt
run_app.bat		run_app.bat
utils.py		utils.py
visuals.py		visuals.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Make_A_Break: LLM Jailbreak Testing Workbench

Features

Prerequisites

Setup and Installation

Directory Structure

How to Use

Customization

Contributing

About

Uh oh!

Releases

Packages

Languages

License

mikopeck/Make_A_Break

Folders and files

Latest commit

History

Repository files navigation

Make_A_Break: LLM Jailbreak Testing Workbench

Features

Prerequisites

Setup and Installation

Directory Structure

How to Use

Customization

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages