-
Notifications
You must be signed in to change notification settings - Fork 2
Developer Guide
This guide covers setting up a development environment, understanding the architecture, and contributing to Agent Brain.
- Architecture Overview
- Monorepo Structure
- Quick Start
- Task Commands
- Development Workflow
- Testing
- Code Style
- Contributing
- Troubleshooting
- Adding Support for New Languages
Agent Brain is a RAG (Retrieval-Augmented Generation) system for semantic search across documentation and source code.
flowchart TB
subgraph Clients["Client Layer"]
CLI["agent-brain<br/>(Click CLI)"]
Skill["Claude Skill<br/>(REST Client)"]
API_Client["External Apps<br/>(HTTP/REST)"]
end
subgraph Server["agent-brain-server"]
subgraph API["REST API Layer"]
FastAPI["FastAPI<br/>/health, /query, /index"]
end
subgraph Services["Service Layer"]
IndexService["Indexing Service"]
QueryService["Query Service"]
end
subgraph Indexing["Content Processing"]
Loader["Document & Code Loader<br/>(LlamaIndex + Tree-sitter)"]
Chunker["AST-Aware Chunking<br/>(Stable Hash ID)"]
Embedder["Embedding Generator<br/>(+ LLM Summaries)"]
end
subgraph AI["AI Models"]
OpenAI["OpenAI Embeddings<br/>(text-embedding-3-large)"]
Claude["Claude Haiku<br/>(Summarization)"]
end
subgraph Storage["Vector Storage"]
ChromaDB["ChromaDB<br/>(Vector Store)"]
end
end
subgraph Documents["Content Sources"]
MD["Markdown Files"]
TXT["Text Files"]
PDF["PDF Files"]
Code["Source Code<br/>10+ Languages"]
end
CLI -->|HTTP| FastAPI
Skill -->|HTTP| FastAPI
API_Client -->|HTTP| FastAPI
FastAPI --> IndexService
FastAPI --> QueryService
IndexService --> Loader
Loader --> Documents
Loader --> Chunker
Chunker --> Embedder
Embedder --> OpenAI
Embedder --> ChromaDB
QueryService --> Embedder
QueryService --> ChromaDB
| Package | Directory | Description |
|---|---|---|
agent-brain-server |
agent-brain-server/ |
FastAPI REST API backend |
agent-brain-cli |
agent-brain-cli/ |
Click-based CLI management tool |
agent-brain-skill |
agent-brain-skill/ |
Claude Code skill definition |
e2e |
e2e/ |
End-to-end integration tests |
- Python 3.10+
-
Poetry -
pip install poetry -
Task -
brew install go-task/tap/go-task - OpenAI & Anthropic API keys
git clone git@github.com:SpillwaveSolutions/agent-brain.git
cd agent-brain
task installtask install:globalThis installs agent-brain-serve and agent-brain in your current Python environment's bin folder, allowing you to run them from any directory.
The root Taskfile.yml orchestrates the entire monorepo.
| Command | Description |
|---|---|
task install |
Install all dependencies |
task install:global |
Install tools as global CLI commands |
task dev |
Start server in development mode |
task pr-qa-gate |
MANDATORY before push: Run all quality checks |
task test |
Run all tests |
task status |
Wrapper for agent-brain status
|
Before pushing any changes, you MUST run:
task pr-qa-gateThis ensures:
- Linting (Ruff) passes.
- Type checking (mypy) passes.
- Unit and Integration tests pass.
- Test coverage is above 50%.
-
agent-brain-server/tests/: Server-specific tests. -
agent-brain-cli/tests/: CLI-specific tests. -
e2e/: Full workflow integration tests.
Before releasing any version or merging major features, you MUST run the end-to-end validation script:
./scripts/quick_start_guide.shThis script validates the complete Agent Brain workflow by:
- Starting a real server instance
- Indexing the project codebase with
--include-code - Running semantic, BM25, and hybrid search queries
- Testing summarization features
- Verifying proper error handling and cleanup
Requirements:
-
OPENAI_API_KEYenvironment variable set - Poetry and lsof installed
- Server and CLI dependencies installed
Exit Codes:
-
0: All tests passed - Non-zero: Test failures or setup issues
The script serves as both a release validation tool and a comprehensive demonstration of Agent Brain's capabilities.
This usually means you are running the tool without installing it or the PYTHONPATH is not set.
Solution: Run task install:global or always use poetry run.
Solution: lsof -ti :8000 | xargs kill -9
Solution: The system uses stable IDs based on file path and chunk index. If you see duplicates, run agent-brain reset --yes to clear the old index and re-index.
Agent Brain supports running multiple concurrent instances with per-project isolation. This enables developers to work on multiple projects simultaneously without port conflicts or index cross-contamination.
Each project stores its state in .claude/doc-serve/:
<project-root>/
└── .claude/
└── doc-serve/
├── config.json # Project configuration (optional, can be committed)
├── runtime.json # Runtime state (DO NOT commit - add to .gitignore)
├── doc-serve.lock # Lock file for preventing double-start
├── doc-serve.pid # Process ID file
├── data/ # ChromaDB and index data
└── logs/ # Server logs
The runtime.json file contains:
{
"mode": "project",
"port": 49321,
"base_url": "http://127.0.0.1:49321",
"pid": 12345,
"instance_id": "abc123def456",
"project_id": "my-project",
"started_at": "2026-01-27T10:30:00Z"
}The lock file prevents concurrent startup:
- Server attempts exclusive lock on
doc-serve.lock - If lock fails, another instance is starting/running
- Lock released on graceful shutdown
- Stale locks detected via PID validation
Project root is determined in this order:
-
Git repository root:
git rev-parse --show-toplevel -
Marker files: Directory containing
.claude/,pyproject.toml,package.json,Cargo.toml, etc. - Current directory: Fallback if no markers found
Symlinks are resolved to canonical paths to ensure consistent state directories.
Settings are resolved in order (first wins):
- Command-line flags (
--port 8080) - Environment variables (
DOC_SERVE_STATE_DIR,DOC_SERVE_MODE) - Project config (
.claude/doc-serve/config.json) - Global config (
~/.doc-serve/config.json) - Built-in defaults
The /health endpoint now includes mode information:
{
"status": "healthy",
"mode": "project",
"instance_id": "abc123def456",
"project_id": "my-project"
}Agent Brain supports AST-aware code chunking for 9+ programming languages using tree-sitter. The current implementation includes: Python, TypeScript, JavaScript, Java, Go, Rust, C, C++, C#.
Adding support for new programming languages is straightforward:
Use tree-sitter-language-pack - a maintained fork with 160+ pre-built language grammars.
Advantages:
- Pre-compiled binaries (no C compiler needed)
- 160+ languages in a single dependency
- Permissive licensing (no GPL dependencies)
- Aligned with tree-sitter 0.25.x
Installation:
pip install tree-sitter-language-packfrom tree_sitter_language_pack import get_language, get_parser
# Get parser for any supported language
parser = get_parser('rust')
language = get_language('rust')
# Parse code
tree = parser.parse(b"fn main() { println!(\"Hello\"); }")Step 1: Verify language support
from tree_sitter_language_pack import get_language
try:
lang = get_language('ruby')
print("Ruby is supported!")
except Exception:
print("Ruby not available")Step 2: Update extension mapping
In agent_brain_server/indexing/document_loader.py:
# Add to CODE_EXTENSIONS
CODE_EXTENSIONS: set[str] = {
".py", ".ts", ".tsx", ".js", ".jsx",
".rb", # NEW: Ruby
}
# Add to EXTENSION_TO_LANGUAGE
EXTENSION_TO_LANGUAGE = {
# ... existing mappings ...
".rb": "ruby",
}Step 3: Register with CodeChunker
In agent_brain_server/indexing/code_chunker.py:
class CodeChunker:
SUPPORTED_LANGUAGES = [
"python", "typescript", "javascript",
"ruby", # NEW
]Step 4: Add language-specific config (optional)
LANGUAGE_CHUNK_CONFIG = {
"python": {"chunk_lines": 50, "overlap": 20},
"ruby": {"chunk_lines": 50, "overlap": 20}, # NEW
"java": {"chunk_lines": 80, "overlap": 30}, # Verbose
"c": {"chunk_lines": 40, "overlap": 15},
}C# is fully supported with AST-aware parsing:
File Extensions:
-
.cs- C# source files -
.csx- C# script files
Extracted Symbols:
- Classes, interfaces, structs, records, enums
- Methods, properties, fields
- Parameters and return types
- Namespaces
XML Documentation:
Agent Brain extracts XML doc comments (/// <summary>, /// <param>, /// <returns>) and stores them as metadata on chunks.
Tree-sitter Grammar:
Uses the c_sharp grammar from tree-sitter-language-pack.
Content Detection Patterns:
using System;-
namespacedeclarations - Property accessors
{ get; set; } - Attributes
[AttributeName]
| Category | Languages |
|---|---|
| Systems | C, C++, Rust, Go, Zig |
| JVM | Java, Kotlin, Scala, Groovy |
| .NET | C#, F# |
| Scripting | Python, Ruby, Perl, Lua, PHP |
| Web | JavaScript, TypeScript, HTML, CSS |
| Functional | Haskell, OCaml, Elixir, Erlang, Clojure |
| Data | SQL, JSON, YAML, TOML, XML |
| Config | Dockerfile, Terraform (HCL), Nix |
| Shell | Bash, Fish, PowerShell |
| Scientific | R, Julia, Fortran |
| Mobile | Swift, Objective-C |
For minimal dependencies, use individual tree-sitter packages:
pip install tree-sitter-python tree-sitter-javascriptimport tree_sitter_python as tspython
from tree_sitter import Language, Parser
PY_LANGUAGE = Language(tspython.language())
parser = Parser(PY_LANGUAGE)The original tree-sitter-languages package (40+ languages):
pip install tree-sitter-languagesfrom tree_sitter_languages import get_language, get_parser
language = get_language('python')
parser = get_parser('python')- Design-Architecture-Overview
- Design-Query-Architecture
- Design-Storage-Architecture
- Design-Class-Diagrams
- GraphRAG-Guide
- Agent-Skill-Hybrid-Search-Guide
- Agent-Skill-Graph-Search-Guide
- Agent-Skill-Vector-Search-Guide
- Agent-Skill-BM25-Search-Guide
Search
Server
Setup
- Pluggable-Providers-Spec
- GraphRAG-Integration-Spec
- Agent-Brain-Plugin-Spec
- Multi-Instance-Architecture-Spec