This repository is a spinoff of the original OLAF project. While OLAF was focused on creating a user-friendly web app for LLM-powered bioinformatics, CARIBOU aims to be a tool to run on accelerated hardware. It is designed for high-performance, reproducible execution of LLM-driven bioinformatics pipelines on GPU and HPC environments. Instead of prioritizing front-end accessibility, CARIBOU emphasizes scalability, optimization, and direct integration with computational biology workflows, making it better suited for large-scale analyses, benchmarking, and research prototyping where raw performance and flexibility are critical.
The CARIBOU CLI is a powerful command-line interface for building, testing, and running sandboxed, multi-agent AI systems.
It provides a robust framework for orchestrating multiple language agents that can collaborate to perform complex tasks, such as data analysis, in a secure and isolated environment.
At its core, CARIBOU allows you to define a team of specialized AI agents in a simple JSON "blueprint." You can then deploy this team into a secure sandbox (powered by Docker or Singularity) with a specific dataset and give them a high-level task to solve.
- Multi-Agent Blueprints: Define agents, their specialized prompts, and how they delegate tasks to each other using a simple JSON configuration.
- Secure Sandboxing: Execute agent-generated code in an isolated environment using Docker or Singularity to protect your host system.
- Interactive & Automated Modes: Run agent systems in a turn-by-turn interactive chat for debugging or in a fully automated mode for benchmarking.
- Data Curation: Includes tools to browse and download single-cell datasets from the CZI CELLxGENE Census to easily test your agents.
- Configuration Management: Easily manage API keys and application settings with built-in commands.
- User-Friendly CLI: A guided, interactive experience helps you configure every run, with flags available to override settings for use in scripts.
Before installing CARIBOU, you need to have the following installed and configured on your system:
- Python (version 3.9 or higher)
- Pip (Python's package installer)
- A Sandbox Backend:
- Docker: Must be installed and the Docker daemon must be running.
- Singularity (Apptainer): Must be installed on your system.
Coming soon!
To install the latest development version, you can clone the repository and install it in editable mode:
git clone https://github.com/OpenTechBio/caribou
cd caribou/cli/caribou
pip install -e .
caribouThis guide will walk you through setting up your API key, downloading a dataset, and launching your first interactive agent session in just a few steps.
First, tell CARIBOU about your OpenAI or DeepSeek API key. This is a one-time setup.
caribou config set-openai-key "sk-YourSecretKeyGoesHere"or
caribou config set-deepseek-key "sk-YourSecretKeyGoesHere"Your key will be stored securely in a local .env file within the CARIBOU configuration directory.
Next, let's get some data for our agents to analyze. Run the datasets command to browse and download a sample dataset from the CZI CELLxGENE Census.
# This will start the interactive dataset browser
caribou datasetsFollow the prompts to list versions and datasets, then use the download command as instructed.
Now you're ready to run an agent system. The run command is fully interactive if you don't provide any flags. It will guide you through selecting a blueprint, a dataset, and a sandbox environment.
caribou run interactiveThis will trigger a series of prompts:
- Select Agent System Blueprint: Choose one of the default systems (from the Package) or one you've created (from User).
- Select a driver agent: Choose which agent in the system will receive the first instruction.
- Select Dataset: Pick the dataset you downloaded in Step 2.
- Choose a sandbox backend: Select
dockerorsingularity. - Choose an LLM backend: Select
chatgptorollama.
After configuration, the session will begin, and you can start giving instructions to your agent team!
CARIBOU's commands are organized into logical groups.
The main command for executing an agent system.
- Run interactively (recommended for manual use):
caribou run interactive
- Run automatically for 5 turns:
caribou run auto --turns 5 --prompt "Analyze this dataset and generate a UMAP plot." - Run with all options specified (for scripting):
caribou run interactive \ --blueprint ~/.local/share/caribou/agent_systems/my_custom_system.json \ --driver-agent data_analyst \ --dataset ~/.local/share/caribou/datasets/my_data.h5ad \ --sandbox docker \ --llm chatgpt
Tools for building new agent system blueprints.
- Start the interactive builder:
caribou create-system
- Create a minimal blueprint quickly:
caribou create-system quick --name my-first-system
Tools for managing datasets.
- Start the interactive dataset browser:
caribou datasets
- Download a specific dataset directly:
caribou datasets download --version stable --dataset-id "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
Manage your CARIBOU configuration.
- Set your OpenAI API key:
caribou config set-openai-key "sk-..." - Set your DeepSeek API key:
caribou config set-deepseek-key "sk-..."
CARIBOU stores all user-generated content and configuration in a central directory. You can override this location by setting the CARIBOU_HOME environment variable.
- Default Location:
- Linux:
~/.local/share/caribou/ - macOS:
~/Library/Application Support/caribou/ - Windows:
C:\Users\<user>\AppData\Local\OpenTechBio\caribou\
- Linux:
- Configuration File: API keys are stored in
$CARIBOU_HOME/.env. - Agent Systems: Custom blueprints are saved to
$CARIBOU_HOME/agent_systems/. - Datasets: Downloaded datasets are stored in
$CARIBOU_HOME/datasets/. - Run Outputs: Code snippets and logs from agent runs are saved to
$CARIBOU_HOME/runs/.