A multi-agent, citation-aware research system that plans, delegates, reflects, and synthesizes high-quality reports in real time.
This project combines a FastAPI + LangGraph backend with a modern Next.js interface to deliver an end-to-end research workflow: from user query to structured, source-backed final report.
- Multi-agent orchestration: parallel subagents investigate different angles of the same problem.
- Reflection loop: the system audits its own coverage, detects gaps, and launches additional follow-up tasks when needed.
- Source-first workflow: Firecrawl web search with automatic DuckDuckGo fallback, source quality scoring, deduplication, and citation pass.
- Real-time observability: live progress phases, subagent lanes, LLM call telemetry, and source tracking in the UI.
- Multi-provider model routing: Gemini, OpenAI, Anthropic, and HuggingFace support with per-role overrides.
- Resilient search layer: Firecrawl is primary; if unavailable or failing, DuckDuckGo takes over transparently — zero config needed.
The result is a research assistant that is significantly more rigorous than a single-shot chat response and much better suited for deep, multi-step analysis.
Research dashboard
Research Sub-Agents
Refection and Gap Analysis
Final synthesized report
flowchart TD
U[User asks a research question] --> P[1. Plan generation]
P --> T[2. Task generation split into subtasks]
T --> D[3. Parallel delegation one subagent per pending task]
subgraph Subagent work in parallel
D --> Q[Generate search queries]
Q --> W[Search web Firecrawl primary DuckDuckGo fallback]
W --> E[Evaluate source quality and refine weak queries]
E --> X[Extract evidence and write sub-report]
end
X --> M[4. Merge reports and deduplicated sources]
M --> G[5. Gap analysis reflection checks missing coverage]
G --> N{New gaps found and iteration limit not reached?}
N -- Yes --> R[Create new gap subtasks and assign agents again]
R --> D
N -- No --> S[6. Synthesize final report]
S --> C[7. Citation evaluation and inline references]
C --> O[8. Stream final cited report to user]
subgraph Live visibility
V1[Backend emits phase and agent events]
V2[SSE stream delivers updates]
V3[UI shows progress agents and sources in real time]
V1 --> V2 --> V3
end
D -. status updates .-> V1
G -. reflection updates .-> V1
C -. citation updates .-> V1
- User sends a query to
POST /api/research/stream. - Backend starts SSE and compiles the LangGraph state machine.
planwrites a research plan.splitcreates independent subtasks.scalesets subagent/tool/source budget.subagentsruns one agent per pending subtask in parallel.- Each subagent searches (Firecrawl → DuckDuckGo fallback), scores sources, refines queries (if weak), extracts evidence, writes a report.
reflectionaudits coverage and either adds gap subtasks or marks research complete.- If gaps exist, only new/pending subtasks run in the next wave.
synthesizemerges reports,citeadds citations, SSE returns final result + completion stats.
- Iterative, not one-shot: reflection can add new subtasks to close missing coverage.
- Deep parallelism: parallel subagents plus parallel search/extract inside each subagent.
- Evidence quality controls: source scoring, domain diversity cap, and query refinement loop.
- Transparent runtime: phase/subagent events are streamed live to the UI over SSE.
- Flexible model routing: different roles can use different providers/models.
- Robust integrations: search/extract responses are normalized before downstream use.
- Resilient search: auto-falls back from Firecrawl to DuckDuckGo when unavailable or failing, so research never stops.
- Smart truncation detection: continues generation only when a report is genuinely cut off (dangling connective), avoiding false-positive retries.
The backend executes an iterative LangGraph state machine with adaptive reflection that gives the agent the ability to:
- break broad questions into independent subtasks,
- run parallel evidence gathering,
- detect missing coverage,
- and only finalize after reflection criteria are met.
Each subagent:
- generates multiple search queries,
- runs web search (Firecrawl primary, DuckDuckGo fallback) and extraction,
- evaluates source quality,
- refines queries when quality is weak,
- writes a focused partial report with evidence.
Instead of stopping after one pass, the reflection node reviews progress and can create new subtasks for unresolved areas. This improves completeness and reduces blind spots.
After synthesis, a dedicated citation pass aligns claims to known sources and produces a citation-enriched final report.
The frontend is a full-featured research dashboard, not a simple chat box. It provides deep visibility into every stage of the research pipeline while keeping all critical information accessible at all times.
During active research, the UI uses a full-width three-column layout:
| Left sidebar (sticky) | Center | Right sidebar (sticky) |
|---|---|---|
| Sources Gathered — quality-scored list of all discovered sources, sorted by relevance, with domain tags and clickable links | Phase Timeline — live-updating step-by-step view of the current research phase with expandable detail panels | Research Progress — phase completion dots and current stage indicator |
| Stays visible while scrolling | Subagent parallel board, LLM call telemetry, reflection decisions | Agent Status — initial agents and gap-fill agents shown separately with live status per agent |
| Domain diversity breakdown | Scaling info, subtask list, plan preview | Live Activity Feed — real-time search and extraction events |
Both sidebars are sticky — they remain visible as the user scrolls through the center content, ensuring sources and agent progress are always one glance away.
- Animated progress bar — weighted by phase (
subagents= 55%,synthesize= 12%, etc.) with elapsed timer and percentage. - Parallel subagent board — grid of agent cards, each showing live queries, web searches with hit counts, quality-scored source bars, extraction status, evidence count, and report length.
- Gap-fill agent separation — agents launched during the initial wave and agents spawned by the reflection/gap-analysis loop are displayed in separate groups with distinct styling (violet for initial, amber for gap-fill).
- Source quality bars — every discovered source shows a color-coded quality bar (green ≥ 70%, amber ≥ 40%, red < 40%) so users can judge evidence strength at a glance.
- Rolling AI topic suggestions — the home screen generates and cycles through AI-suggested research topics with smooth roll-in/roll-out card animations.
- Provider and model status — header shows the active LLM provider, model name, and online/offline health indicator polling every 15 seconds.
- Post-completion layout — after research finishes, a completion banner shows total sources, reports, iterations, and time. The research process collapses into an expandable accordion, with sources and agents still accessible in the sidebars.
- Glassmorphic dark theme — backdrop blur, animated mesh gradient background, and subtle glow effects throughout.
- Mobile responsive — on smaller screens, sidebars collapse below the main content. Sources and agent panels are still accessible as stacked sections.
- Markdown rendering — final reports render with full Markdown support (headings, lists, code blocks, links) via React Markdown + GFM.
- Assignment unit: one subagent is spawned per pending subtask (
to_run = subtasks - completed_subtasks). - Iteration model: after each subagent wave, reflection can append new subtasks; next wave runs only new/pending items.
- Completion tracking: successful subagent returns mark
completed_subtasks, incrementiteration_count, and merge/dedupe sources. - Progress % is weighted by phase in code:
init=2,plan=8,split=5,scale=5,subagents=55,reflection=5,synthesize=12,cite=8. - UI observability comes from SSE events emitted by backend (
subagents-launch,subagent-step,subagent-search,subagent-sources-scored,subagent-extract,subagent-complete,reflection,complete).
- This project uses
langgraph(LangChain ecosystem) as the orchestration engine viaStateGraph. - The graph is built in
backend/graph.py, compiled at request time, and invoked with a typedResearchState. - Role-specific LLM calls are implemented through custom provider adapters (Gemini/OpenAI/Anthropic/HuggingFace), not LangChain agent executors.
langsmith.traceableis used for node/function-level observability.
- Wave-level parallelism: each pending subtask is executed concurrently via
asyncio.to_thread(...)inrun_subagents_parallel(). - Per-subagent search parallelism: each subagent runs query searches in a
ThreadPoolExecutor. - Per-subagent extract parallelism: each subagent runs URL extraction in another
ThreadPoolExecutor. - Net effect: nested parallelism (subtask concurrency + I/O concurrency inside each subtask).
- Reflection runs after every subagent wave.
- If
iteration_count >= max_iterations, backend forcesresearch_complete = trueand emitsmax-iterations-reached. - Otherwise reflection LLM audits existing subagent reports and returns JSON with
subtasks. - If returned
subtasksis non-empty: they are appended tostate["subtasks"];research_complete = false; next loop runs only newly pending subtasks. - If returned
subtasksis empty:research_complete = trueand pipeline proceeds to synthesis.
flowchart LR
FE["Frontend\nNext.js + React"] -->|SSE| BE["Backend\nFastAPI + LangGraph"]
BE --> SEARCH["Search Layer\nFirecrawl -> DuckDuckGo fallback"]
BE --> MODELS["LLM Providers\nGemini | OpenAI | Anthropic | HuggingFace"]
BE --> DATA["Data/Obs (optional)\nSupabase | LangSmith"]
deep_research_agent/
├── backend/ # FastAPI server, graph, agents, prompts
├── frontend/ # Next.js UI
├── assets/ # Screenshots and media (you add files)
├── requirements.txt
├── run.py # Backend runner
├── .env.example
git clone <your-repo-url>
cd deep_research_agent
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cd frontend
npm install
cd ..Copy .env.example to .env and set at least:
LLM_PROVIDER=gemini
LLM_MODEL=gemini-2.5-pro
GEMINI_API_KEY=...
FIRECRAWL_API_KEY=...Optional but recommended:
SUPABASE_URL=...
SUPABASE_SERVICE_KEY=...
LANGSMITH_API_KEY=...
MAX_ITERATIONS=3
QUALITY_THRESHOLD=0.7source .venv/bin/activate
python run.pyBackend default: http://localhost:8000
cd frontend
npm run devFrontend default: http://localhost:3000
Global defaults:
LLM_PROVIDERLLM_MODELMAX_ITERATIONSQUALITY_THRESHOLD
Per-role model routing (examples):
PLANNER_PROVIDER,PLANNER_MODELSUBAGENT_PROVIDER,SUBAGENT_MODELCOORDINATOR_PROVIDER,COORDINATOR_MODELCITATION_PROVIDER,CITATION_MODEL
This enables specialized model selection for planning, extraction reasoning, synthesis, and citation.
| Service | Required | Environment variables | Purpose |
|---|---|---|---|
| LLM Provider (Gemini/OpenAI/Anthropic/HF) | Yes (at least one) | LLM_PROVIDER, LLM_MODEL, plus provider key (GEMINI_API_KEY or OPENAI_API_KEY or ANTHROPIC_API_KEY or HF_TOKEN) |
All reasoning/planning/report generation |
| Firecrawl | Recommended | FIRECRAWL_API_KEY |
Primary web search + extraction (auto-falls back to DuckDuckGo if unavailable or key is missing) |
| DuckDuckGo | Built-in | None (no key needed) | Free search fallback — activates automatically when Firecrawl is unavailable |
| Supabase | Optional | SUPABASE_URL, SUPABASE_SERVICE_KEY |
Persist checkpoints/artifacts |
| LangSmith | Optional | LANGSMITH_API_KEY, LANGCHAIN_TRACING_V2, LANGCHAIN_PROJECT |
Trace/observability |
Notes:
- If Firecrawl is unavailable or the key is missing, search transparently falls back to DuckDuckGo (free, no key). Extract calls return empty and subagents rely on search snippets as evidence.
- Supabase is best-effort persistence; pipeline still runs without it.
- Quick usage (no code changes): set
LLM_MODEL=<provider-supported-model-id>in.env. - Role-specific usage: set
<ROLE>_MODEL(e.g.,SUBAGENT_MODEL=...,COORDINATOR_MODEL=...). - Show model in frontend selector: add it to
AVAILABLE_MODELSinbackend/config.py. - Optional default update: add to
DEFAULT_MODELSinbackend/config.py.
- Create provider adapter in
backend/providers/<new_provider>_provider.pyimplementingLLMProvider.chat(...). - Register it in
backend/providers/__init__.pyinsideget_provider()andlist_providers(). - Add provider models to
AVAILABLE_MODELSand defaults toDEFAULT_MODELSinbackend/config.py. - Add required API key env var in
.env.example. - (Optional) set role-level overrides to route specific stages to the new provider.
Key endpoints:
POST /api/research/stream— starts a streaming research run (SSE)POST /api/research— starts a run and returns run_idGET /api/research/{run_id}/stream— stream by run idGET /api/config— provider/model configurationGET /api/health— backend health statusPOST /api/topics/suggestions— AI-generated topic cards for UI suggestions
Research capabilities:
- Turn broad, complex prompts into structured research plans.
- Investigate multiple perspectives in parallel via independent subagents.
- Gather and evaluate web evidence with quality-aware filtering and domain diversity controls.
- Detect missing information through reflection and perform follow-up research autonomously.
- Produce long-form, reasoned reports with a dedicated citation stage.
Real-time observability:
- Stream transparent progress so users can inspect the full reasoning workflow as it happens.
- Show every source gathered (with quality score), every search executed, and every agent's status in real time.
- Separate initial research agents from gap-fill agents spawned by reflection, so the user always knows why each agent was created.
- Keep sources and agent progress visible at all times via sticky sidebars — nothing disappears on scroll.
- Display provider health, model selection, and elapsed research time in the header.
- This system depends on external model and search APIs; output quality depends on provider health, credentials, and source availability.
- For best results, use clear research prompts with scope, time range, and target domain.
- Anthropic Engineering: Multi-agent research system (conceptual inspiration)




