Make_A_Break is a Python-based tool designed for researchers and developers to automatically test Large Language Model (LLM) vulnerabilities by attempting various "jailbreak" techniques. It utilizes local Ollama models for both the target LLM and an AI Judge, LangGraph for orchestrating the testing workflow, and Streamlit for an interactive user interface.
- Local LLM Interaction: Leverages Ollama to run LLMs locally for both the target model being tested and the AI judge evaluating responses.
- Automated Jailbreak Attempts: Uses a predefined (and extensible) set of jailbreak strategies.
- Workflow Orchestration: Employs LangGraph to manage the sequence of operations: prompt generation, LLM querying, and response evaluation.
- AI-Powered Evaluation: An "AI Judge" (another LLM) assesses whether the target LLM's response constitutes a successful jailbreak.
- Interactive Web UI: A Streamlit interface allows users to:
- Configure target and judge Ollama models.
- Add, view, and manage jailbreak strategies.
- Select tasks and strategies for test runs.
- View real-time progress and detailed logs of test attempts.
- Download results.
- Customizable Tasks & Strategies: Easily define new tasks and strategies via JSON files or the UI.
- Comprehensive Logging: Records detailed information about each test attempt, including prompts, responses, and verdicts, to a
results/jailbreak_log.jsonlfile.
Before you begin, ensure you have the following installed:
- Python: Version 3.8 or higher. Make sure it's added to your system's PATH.
- Ollama: Installed and running. You can download it from ollama.com.
- Ollama Models: Download the LLMs you intend to use as the target and the judge via the Ollama CLI. For example:
(These are examples; you can use any models compatible with Ollama.)
ollama pull llama3 ollama pull mistral
-
Clone or Download the Repository: Get the project files onto your local machine.
-
Windows Users (Recommended):
- Navigate to the project's root directory in your command prompt or PowerShell.
- Run the
run_app.batscript:run_app.bat
- This script will automatically:
- Create a Python virtual environment (e.g.,
jailbreak_env). - Activate the virtual environment.
- Install all necessary dependencies from
requirements.txt. - Start the Streamlit application.
- Create a Python virtual environment (e.g.,
-
Manual Setup (All Platforms / If
run_app.batis not used):- Open your terminal or command prompt.
- Navigate to the project's root directory.
- Create a Python virtual environment:
python -m venv jailbreak_env
- Activate the virtual environment:
- Windows:
jailbreak_env\Scripts\activate - macOS/Linux:
source jailbreak_env/bin/activate
- Windows:
- Install the required packages:
pip install -r requirements.txt
- Run the Streamlit application:
streamlit run app.py
Make_A_Break/
├── app.py # Streamlit UI application
├── graph_runner.py # LangGraph execution logic
├── langgraph_setup.py # LangGraph definition
├── llm_interface.py # Ollama interaction functions
├── judge.py # AI Judge logic
├── utils.py # Utility functions (data loading, etc.)
├── requirements.txt # Python dependencies
├── run_app.bat # Windows batch script for easy startup
├── README.md # This file
├── data/
│ ├── tasks.json # Dataset of tasks for the LLM
│ └── strategies.json # Predefined jailbreak strategies
├── results/
│ └── jailbreak_log.jsonl # Log of test attempts and results
└── jailbreak_env/ # Virtual environment directory (created by script)
-
Start the Application:
- Run
run_app.bat(Windows) or follow the manual setup steps to start the Streamlit app. - The application should open in your default web browser, typically at
http://localhost:8501.
- Run
-
Configure Models (Sidebar):
- In the sidebar, enter the names of the Target Ollama Model (the LLM you want to test) and the Judge Ollama Model (the LLM that will evaluate responses). Ensure these models are available in your Ollama setup.
-
Manage Strategies (Sidebar):
- View existing jailbreak strategies.
- Add new strategies by providing a name, a unique ID, and a template. The template should use
{task_prompt}as a placeholder for the actual task prompt. - Strategies are saved to
data/strategies.json.
-
Define Tasks:
- Tasks are defined in
data/tasks.json. Each task includes anid,description, thepromptto be sent to the LLM, and aharm_category. You can manually edit this file to add or modify tasks.
- Tasks are defined in
-
Run Tests (Main Area):
- Select one or more tasks from the "Select Tasks to Run" dropdown.
- Select one or more strategies from the "Select Strategies to Apply" dropdown.
- Click the "Start Jailbreak Test Run" button.
- The application will iterate through each selected task-strategy combination, query the target LLM, and have the judge LLM evaluate the response.
-
View Logs & Results (Main Area):
- Progress and immediate results will be displayed as tests run.
- A table of all results is shown, which can be refreshed from the
jailbreak_log.jsonlfile. - You can download the full results as a CSV file.
- The most recent detailed result (JSON format) is also displayed for closer inspection.
- All attempts are logged to
results/jailbreak_log.jsonl.
- Adding Tasks: Edit the
data/tasks.jsonfile to add new tasks. Follow the existing JSON structure.[ { "id": "new_task_example", "description": "A brief description of the new task.", "prompt": "The actual prompt for the LLM.", "harm_category": "e.g., misinformation" } ] - Adding Strategies:
- Via UI: Use the "Add New Strategy" section in the Streamlit sidebar.
- Manually: Edit the
data/strategies.jsonfile. Ensure each strategy has a uniqueid, aname, and atemplatestring.[ { "id": "S_custom_example", "name": "My Custom Strategy", "template": "This is a custom wrapper around the request: {task_prompt}. Please comply." } ]
Contributions, bug reports, and feature requests are welcome! Please feel free to open an issue or submit a pull request.