Nightcrawler

Autonomous overnight agent loop for Claude Code. Decomposes multi-hour missions into bounded 30-60 minute episodes with structured handoffs, supervised by macOS launchd for crash recovery.

Nightcrawler runs while you sleep. You write a mission file, start the orchestrator, and wake up to a completion report.

You (11pm)                    Nightcrawler (11pm-7am)                    You (7am)
    |                               |                                      |
    +-- Write MISSION.md            |                                      |
    +-- launchctl load ...          |                                      |
    +-- Sleep                       +-- Episode 1 (breadth scan)           |
                                    +-- HANDOFF.md written                 |
                                    +-- Episode 2 (deep-dive A)            |
                                    +-- HANDOFF.md written                 |
                                    +-- Episode 3 (deep-dive B)            |
                                    +-- ...                                |
                                    +-- Episode N (synthesis)              |
                                    +-- COMPLETION_REPORT.md               |
                                                                           +-- Read report
                                                                           +-- Review artifacts

Why Not Just Loop Claude?

The naive approach is a shell while-loop that restarts claude when it exits. Ralph Loop (ghuntley.com/ralph) popularized this pattern. It works, but has real problems at scale.

	Shell Loop	Ralph Loop	Nightcrawler
Context management	None -- same session until OOM	In-session CLAUDE.md reread	Clean context per episode with structured HANDOFF.md
Crash recovery	Dies with terminal	Dies with terminal	launchd restarts on crash, survives sleep/logout
State persistence	None	In-memory only	STATE.json + checkpoints + PROGRESS.jsonl
Cost control	None	None	Per-episode AND total mission budget caps
Task integrity	Agent rewrites freely	Agent rewrites freely	Immutable tasks.json -- agents can only flip `passes: false` to `true`
Termination	Manual Ctrl-C	Manual or token limit	8 automatic conditions (budget, time, errors, diminishing returns, ...)
Progress verification	Trust agent output	Trust agent output	Git diff cross-check against handoff claims
Multi-hour missions	Context window degrades	Context window degrades	Each episode gets a fresh context window
Process supervision	None	None	launchd plist with throttle, timeout, nice level
Notifications	None	None	Push notifications on start, error, completion

Architecture

+-------------------------------------------------------------------+
|                        macOS launchd                               |
|  (crash restart, 12h timeout, throttle, nice 5, background)       |
+-------------------------------------------------------------------+
                              |
                              v
+-------------------------------------------------------------------+
|                    nightcrawler.ts (~550 LOC)                      |
|                                                                    |
|  +-------------------+    +-------------------+    +----------+   |
|  | State Manager     |    | Termination Logic |    | Notifier |   |
|  | STATE.json        |    | 8 conditions      |    | Moshi    |   |
|  | tasks.json        |    | checked per cycle  |    | push     |   |
|  | checkpoints/      |    |                   |    |          |   |
|  +-------------------+    +-------------------+    +----------+   |
|                                                                    |
|  +-------------------+    +-------------------+                   |
|  | Prompt Builder    |    | Episode Runner    |                   |
|  | mission + state + |    | claude -p         |                   |
|  | handoff + git +   |--->| --dangerously-    |                   |
|  | skill + tasks     |    |   skip-permissions|                   |
|  +-------------------+    | --max-budget-usd  |                   |
|                           | --model opus      |                   |
|                           +-------------------+                   |
+-------------------------------------------------------------------+
                              |
                              v
+-------------------------------------------------------------------+
|                         Claude Code                                |
|  (autonomous, no human, CLAUDECODE env deleted)                   |
|                                                                    |
|  Session opening ritual -> Work -> Handoff -> State update         |
+-------------------------------------------------------------------+

Layer Stack

Layer 4:  MISSION.md           Human intent (written once)
Layer 3:  nightcrawler.ts      Orchestration loop, termination, budget
Layer 2:  claude -p             Claude Code in headless/pipe mode
Layer 1:  launchd               Process supervision, crash recovery
Layer 0:  macOS                 Sleep/wake handling, resource limits

Data Flow

MISSION.md ──> tasks.json ──> STATE.json ──> HANDOFF.md ──> COMPLETION_REPORT.md
  (human)      (auto-gen)     (progress)    (per-episode)     (final output)
               (immutable)    (checkpoint)   (structured)

Directory Structure

~/.nightcrawler/
  nightcrawler.ts              # Orchestrator (TypeScript, ~550 LOC)
  config.json                  # Runtime configuration
  com.user.nightcrawler.plist  # launchd service definition
  package.json                 # Node dependencies (tsx only)

  missions/
    active/
      MISSION.md               # Current mission (human writes this)
    TEMPLATE-research.md       # Research mission template
    TEMPLATE-implementation.md # Implementation mission template

  templates/
    MISSION-research.md        # Quick-start research template
    MISSION-implementation.md  # Quick-start implementation template

  skills/
    nightcrawler-episode.md    # Skill instructions loaded per episode

  state/
    STATE.json                 # Current mission state (progress, budget, errors)
    HANDOFF.md                 # Context transfer between episodes
    tasks.json                 # Immutable task tracker (auto-generated)
    PROGRESS.jsonl             # Append-only event log
    LOCK                       # PID lockfile (prevents double-run)
    STOP                       # Touch this file to stop after current episode
    COMPLETION_REPORT.md       # Final report (generated on termination)
    checkpoints/
      episode-001.json         # State snapshot after each episode
      episode-002.json
      ...

  logs/
    orchestrator.log           # Orchestrator event log
    launchd-stdout.log         # launchd stdout capture
    launchd-stderr.log         # launchd stderr capture
    episodes/
      episode-001.log          # Full output from each Claude session
      episode-002.log
      ...

Quick Start

1. Install

cd ~/.nightcrawler
npm install

The only dependency is tsx for running TypeScript directly.

2. Write a Mission

# Copy a template
cp ~/.nightcrawler/templates/MISSION-research.md ~/.nightcrawler/missions/active/MISSION.md

# Edit it
$EDITOR ~/.nightcrawler/missions/active/MISSION.md

3. Run

Foreground (good for testing):

cd ~/.nightcrawler && npx tsx nightcrawler.ts

Dry run (generates prompts without calling Claude):

cd ~/.nightcrawler && npx tsx nightcrawler.ts --dry-run

launchd (survives terminal close, machine sleep, crashes):

cp ~/.nightcrawler/com.user.nightcrawler.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/com.user.nightcrawler.plist

4. Monitor

# Watch orchestrator log
tail -f ~/.nightcrawler/logs/orchestrator.log

# Check current state
cat ~/.nightcrawler/state/STATE.json | python3 -m json.tool

# Read latest handoff
cat ~/.nightcrawler/state/HANDOFF.md

# Read episode output
cat ~/.nightcrawler/logs/episodes/episode-001.log

5. Stop

# Graceful: finish current episode, then stop
touch ~/.nightcrawler/state/STOP

# Immediate: unload the service
launchctl unload ~/Library/LaunchAgents/com.user.nightcrawler.plist

6. Reset for Next Mission

rm -rf ~/.nightcrawler/state/*
rm -rf ~/.nightcrawler/logs/episodes/*
# Write new MISSION.md, then start again

Configuration

~/.nightcrawler/config.json:

{
  "max_duration_hours": 12,
  "max_episodes": 24,
  "max_budget_usd": 50.00,
  "budget_per_episode_usd": 5.00,
  "episode_timeout_seconds": 3600,
  "model": "claude-opus-4-6",
  "bootstrap_command": null,
  "error_threshold": 10,
  "diminishing_returns_lookback": 3,
  "cooldown_between_episodes_seconds": 10,
  "moshi_token": "",
  "notifications": {
    "on_start": true,
    "on_episode_complete": false,
    "on_error": true,
    "on_completion": true
  }
}

Field	Default	Description
`max_duration_hours`	12	Hard wall-clock limit for the entire mission
`max_episodes`	24	Maximum number of episodes before forced termination
`max_budget_usd`	50.00	Total API spend cap across all episodes
`budget_per_episode_usd`	5.00	Per-episode spend cap (passed to `claude --max-budget-usd`)
`episode_timeout_seconds`	3600	Kill an episode after this many seconds (1 hour default)
`model`	claude-opus-4-6	Model to use for episodes
`bootstrap_command`	null	Command to run before each episode (e.g., `npm test`)
`error_threshold`	10	Stop after this many total errors
`diminishing_returns_lookback`	3	Check last N episodes for progress stall
`cooldown_between_episodes_seconds`	10	Pause between episodes
`moshi_token`	""	Moshi push notification token (optional)

Termination Conditions

The orchestrator checks 8 conditions before each episode. If any fires, the mission ends and a COMPLETION_REPORT.md is written.

Condition	Trigger
Human stop flag	`~/.nightcrawler/state/STOP` file exists
Agent termination	Agent sets `termination_check.should_continue = false` in STATE.json
Episode limit	`current_episode >= max_episodes`
Duration limit	Wall-clock time exceeds `max_duration_hours`
Budget limit	`budget_spent_usd >= max_budget_usd`
Error threshold	Total errors reach `error_threshold`
Fatal error	Any fatal error recorded
Diminishing returns	Last N episodes averaged < 0.5 tasks completed per episode

Immutable Task Tracking

When a mission starts, Nightcrawler auto-generates state/tasks.json from the - [ ] checkboxes in MISSION.md:

[
  { "id": 1, "description": "Survey the landscape", "passes": false },
  { "id": 2, "description": "Deep-dive: topic A", "passes": false },
  { "id": 3, "description": "Synthesize findings", "passes": false }
]

Agents can ONLY change passes from false to true. They cannot delete tasks, reorder them, rewrite descriptions, or add new ones. This prevents a known failure mode where autonomous agents rewrite their own success criteria to declare premature victory.

This pattern comes from Anthropic's recommendations for autonomous agent harnesses.

Episode Lifecycle

Each episode follows a strict protocol:

1. Orchestrator checks 8 termination conditions
2. Orchestrator runs bootstrap_command (if configured)
3. Orchestrator builds prompt:
   - Skill instructions (nightcrawler-episode.md)
   - Episode metadata (number, budget, progress)
   - Full MISSION.md
   - Current STATE.json
   - Previous HANDOFF.md (if not Episode 1)
   - Git context (recent commits + diff)
   - tasks.json (immutable tracker)
   - Session opening ritual instructions
   - End-of-episode requirements
4. Orchestrator spawns: claude -p --dangerously-skip-permissions --model <model> --max-budget-usd <cap>
5. Agent executes session opening ritual:
   a. Reads STATE.json
   b. Reads HANDOFF.md
   c. Runs git log to verify handoff claims
   d. Runs baseline checks (tests/lint for implementation missions)
   e. Picks highest-priority incomplete task
6. Agent does work
7. Agent writes HANDOFF.md (structured template)
8. Agent updates STATE.json and tasks.json
9. Orchestrator re-reads state, records episode in history
10. Orchestrator saves checkpoint
11. Orchestrator waits cooldown_between_episodes_seconds
12. Loop back to step 1

Git Diff Verification

A common failure mode in autonomous agents is hallucinated progress: the agent claims it completed work that doesn't exist. Nightcrawler mitigates this by injecting the actual git diff and recent commit log into each episode's prompt.

The next episode can cross-check what the previous handoff claims against what the git history shows. If they diverge, the agent knows to distrust the handoff and verify from source.

Session Opening Ritual

Before doing any work, each episode executes a mandatory checklist (Anthropic's "session opening ritual" pattern):

Read STATE.json -- understand position in the mission
Read MISSION.md -- understand the goal
Read HANDOFF.md -- understand previous work
Read tasks.json -- understand what's done and what's left
Run git log --oneline -5 -- verify what actually changed
Run baseline checks (tests/lint for implementation missions)
Pick the highest-priority incomplete task

This prevents agents from starting work based on stale assumptions or hallucinated context.

Mission Templates

Research Mission

Research missions follow a breadth-depth-synthesis pattern:

# Mission: Autonomous Agent Coordination Protocols

**Type:** research
**Created:** 2026-02-22
**Max Duration:** 12 hours
**Max Episodes:** 24

## Objective

Survey the landscape of multi-agent coordination protocols. Identify
all major approaches (A2A, MCP, custom), map their tradeoffs, and
produce a synthesis document with confidence levels per claim.

## Existing Research

- `~/research/multi-agent-survey.md` -- prior survey from January

## Depth Targets

- [ ] Survey the landscape: identify all major protocols and frameworks
- [ ] Deep-dive: Google A2A protocol
- [ ] Deep-dive: Anthropic MCP coordination patterns
- [ ] Deep-dive: Academic approaches (arXiv, ACL)
- [ ] Cross-reference: identify contradictions between sources
- [ ] Synthesize: write final analysis with confidence levels per claim
- [ ] Bibliography: all sources cited with URLs

## Source Requirements

- Minimum 10 unique sources
- At least 3 academic papers (arXiv, ACL, etc.)
- At least 2 industry implementations or case studies
- Flag any claim with only 1 source as [UNVERIFIED]

## Output Artifacts

- `research/agent-coordination-analysis.md`
- `research/agent-coordination-sources.md`
- `research/agent-coordination-gaps.md`

## Constraints

- Do NOT hallucinate citations -- every URL must be real
- Do NOT pad with filler -- quality over word count
- Use WebSearch for discovery, WebFetch for reading sources

## Success Criteria

- [ ] All depth targets complete
- [ ] 10+ unique real sources cited
- [ ] Contradictions identified and analyzed
- [ ] Final synthesis written with confidence levels
- [ ] Gaps document captures remaining unknowns

Implementation Mission

# Mission: Add Rate Limiting to API Gateway

**Type:** implementation
**Created:** 2026-02-22
**Max Duration:** 8 hours
**Max Episodes:** 16

## Objective

Add token-bucket rate limiting to the API gateway. Per-user limits
with configurable burst. Must not break existing tests.

## Tasks

- [ ] Add rate limiter module with token-bucket algorithm
  - Files: src/middleware/rate-limiter.ts
  - Success criteria: unit tests pass

- [ ] Integrate with request pipeline
  - Files: src/server.ts, src/middleware/index.ts
  - Success criteria: integration tests pass

- [ ] Add configuration and per-user overrides
  - Files: src/config.ts, config/default.json
  - Success criteria: config loads, overrides work

- [ ] Add monitoring and rate-limit response headers
  - Files: src/middleware/rate-limiter.ts
  - Success criteria: X-RateLimit-* headers present

## Constraints

- Do NOT break existing tests
- Do NOT add new dependencies without justification
- Token bucket, not sliding window

## Success Criteria

- [ ] All tasks complete
- [ ] All tests passing (existing + new)
- [ ] No lint errors

launchd Service

The included plist configures launchd for production use:

Crash recovery: Restarts on crash, does not restart on clean exit
Throttle: At most one restart per 30 seconds
Timeout: Hard 12-hour wall-clock limit
Priority: Nice level 5 (lower than interactive processes)
Process type: Background
File descriptors: 4096 soft limit
Logging: stdout and stderr captured to logs/

# Install
cp ~/.nightcrawler/com.user.nightcrawler.plist ~/Library/LaunchAgents/

# Load (start)
launchctl load ~/Library/LaunchAgents/com.user.nightcrawler.plist

# Unload (stop)
launchctl unload ~/Library/LaunchAgents/com.user.nightcrawler.plist

# Check status
launchctl list | grep nightcrawler

Process Safety

PID lockfile: Only one orchestrator instance runs at a time. Stale locks from crashed processes are detected and cleaned.
CLAUDECODE env deleted: Prevents the spawned Claude from thinking it's inside another Claude session, which causes behavioral issues.
TERM=dumb: Prevents ANSI escape codes in output logs.
Signal handling: Clean lock release on SIGTERM and SIGINT.

Notifications

Nightcrawler sends push notifications via Moshi at key events:

on_start: Mission name and task count
on_error: Episode number, exit code, error count vs threshold
on_completion: Final status, tasks completed, episodes run, budget spent

Set moshi_token in config.json. Leave empty to disable.

Design Decisions

Why episodes instead of one long session? Claude Code sessions degrade as context fills up. A 60-minute episode with a clean context window produces better work than minute 300 of a continuous session. The handoff protocol transfers only the essential context, not the full conversation history.

Why immutable tasks.json? Without constraints, autonomous agents tend to rewrite their own objectives. An agent that can delete tasks will eventually "complete" a mission by removing the hard parts. The immutable tracker forces agents to do the actual work.

Why launchd instead of a shell script? A shell loop dies when the terminal closes, the SSH session drops, or the machine sleeps. launchd is the macOS init system -- it survives all of these and restarts crashed processes automatically.

Why delete the CLAUDECODE environment variable? When Claude Code detects the CLAUDECODE env var, it adjusts its behavior for being inside another Claude instance. This causes problems in headless mode. Deleting it gives the episode a clean behavioral context.

Why git diff verification? Autonomous agents can hallucinate progress -- claiming they wrote files that don't exist or made changes that aren't in the diff. Injecting the actual git state lets the next episode detect and correct this.

Research Toolkit

Nightcrawler ships with a research-specific extension: a paper monitor that watches arXiv and Semantic Scholar overnight, a synthesis engine that merges findings into a running literature review, and a suite of structured mission templates purpose-built for academic research tasks. All glued together by a single CLI wrapper (ncr) that generates missions, launches Nightcrawler, and synthesizes output.

The goal: wake up to a literature review, not a blank terminal.

Architecture

                        WATCHTOWER
                      (Rust, ncr watch)
              ┌─────────────────────────────────┐
              │  arXiv RSS  arXiv Search API     │
              │  Semantic Scholar (225M papers)  │
              │  Relevance scoring + dedup       │
              │  → papers.jsonl                  │
              └──────────────┬──────────────────┘
                             │ relevant paper detected
                             ▼
                        NCR CLI
                     (Rust, 4.2MB binary)
              ┌─────────────────────────────────┐
              │  ncr research "topic"            │
              │  ncr deepdive "arxiv.org/..."    │
              │  → renders MISSION.md template   │
              └──────────────┬──────────────────┘
                             │ MISSION.md written
                             ▼
                      NIGHTCRAWLER
                    (nightcrawler.ts)
              ┌─────────────────────────────────┐
              │  Episode 1: breadth scan         │
              │  Episode 2: deep-dive A          │
              │  Episode 3: deep-dive B          │
              │  Episode N: synthesis            │
              │  (crash recovery, budget caps,   │
              │   immutable task tracking)       │
              └──────────────┬──────────────────┘
                             │ research/*.md written
                             ▼
                       SYNTHESIS
                     (Rust, ncr synthesize)
              ┌─────────────────────────────────┐
              │  Reads research output markdown  │
              │  Extracts findings + confidence  │
              │  Detects contradictions          │
              │  Maintains literature-review.md  │
              └─────────────────────────────────┘

Quick Start (Research Mode)

cd ~/.nightcrawler

# 1. Start watching for new papers (one-shot poll)
ncr watch

# 2. Generate a literature survey mission
ncr research "autonomous AI research agents"

# 3. (Optional) Preview mission before launching
cat missions/active/MISSION.md

# 4. Launch Nightcrawler to execute the mission overnight
ncr launch

# 5. Monitor progress
ncr status

# 6. After mission completes, synthesize findings
ncr synthesize

# 7. View the running literature review
ncr review

To deep-dive a specific paper from an arXiv URL:

ncr deepdive "https://arxiv.org/abs/2510.16572"
ncr launch

To run the paper monitor continuously in the background:

ncr watch --daemon

Research CLI (`ncr`)

The ncr binary is a native Rust CLI (4.2MB, zero runtime deps). Install with cargo install --path research-rs or copy research-rs/target/release/ncr to your PATH.

Command	Description
`ncr watch`	One-shot poll: arXiv RSS + Search + Semantic Scholar. Prints new papers found.
`ncr watch --daemon`	Continuous polling on `poll_interval_minutes` schedule.
`ncr research "topic"`	Generate a `MISSION-literature-survey.md` mission for a topic.
`ncr deepdive "url"`	Generate a `MISSION-paper-deepdive.md` mission from an arXiv URL.
`ncr papers`	List all tracked papers in `papers.jsonl` with relevance scores.
`ncr synthesize`	Run synthesis on all `research/*.md` output files.
`ncr review`	Print the current `literature-review.md`.
`ncr status`	Show watchtower config, paper count, mission status, last poll time.
`ncr launch`	Start Nightcrawler with the current active mission.
`ncr templates`	List all available mission templates in `templates/`.

The Full Pipeline

You (evening)                  Overnight                          You (morning)
     |                              |                                   |
     +-- ncr watch --daemon         |                                   |
     +-- ncr research "topic"  ---> +-- Episode 1 (breadth scan)       |
     +-- ncr launch                 +-- Episode 2 (deep A)             |
     +-- Sleep                      +-- Episode 3 (deep B)             |
                                    +-- Episode N (synthesis)           |
                                    +-- COMPLETION_REPORT.md            |
                                                                        +-- ncr synthesize
                                                                        +-- ncr review
                                                                        +-- literature-review.md ready

Each stage is independently useful. Watchtower runs as a cron job. The mission templates work without the CLI. Synthesis runs on any directory of markdown files from past missions. Nothing requires the whole stack to be running at once.

Components

Watchtower (`research/watchtower.ts`)

Polls three APIs for new papers matching your configured topics and keywords:

arXiv RSS: Daily feed per category (e.g., cs.AI, cs.MA). Instant for new preprints.
arXiv Search API: Keyword-based query across all dates. Catches papers outside your RSS categories.
Semantic Scholar API: 225M papers with citation counts, year, abstract. Best for relevance-ranked discovery.

Papers are deduplicated by normalized ID (case-insensitive) and scored by keyword relevance. Title matches are weighted 3x over abstract matches; abstract-only hits are penalized 0.6x. Papers must match at least min_keyword_matches keywords and score above relevance_threshold to be written to research/papers.jsonl. Negative keywords (e.g., "reinforcement learning", "autonomous driving") instantly reject irrelevant papers.

If auto_launch: true, Watchtower generates a mission and starts Nightcrawler automatically when a sufficiently relevant paper arrives.

Optional vault-aware boosting: if vault_path is set, papers whose abstracts mention topics already in your VaultGraph knowledge base score higher.

Synthesis (`research/synthesis.ts`)

Reads all markdown files in the research output directory after a mission completes and:

Extracts findings tagged with confidence levels (HIGH, MEDIUM, LOW, UNVERIFIED)
Detects contradictions: two findings that make opposing claims about the same subject
Merges new findings into literature-review.md, deduplicating by semantic similarity
Appends a contradiction log if any conflicts were detected

The literature review accumulates across missions. Each run adds to it rather than replacing it, so the file grows into a comprehensive knowledge base over time.

Confidence levels follow the research episode skill convention:

Level	Meaning
`HIGH`	Multiple independent sources agree
`MEDIUM`	Single strong source, or multiple weak sources
`LOW`	Single weak source, or inference from adjacent evidence
`UNVERIFIED`	Claim with no traceable source; flagged for follow-up

Research Episode Skill (`skills/research-episode.md`)

An enhanced version of the standard episode skill, loaded automatically for research missions. Teaches the agent:

How to query Semantic Scholar, OpenAlex, and arXiv APIs directly (with example curl commands)
The breadth-then-depth methodology: survey wide first, then pick the 3-5 most relevant threads to follow down
Cross-referencing protocol: when two sources conflict, note both, attribute the conflict, don't resolve it arbitrarily
Structured output format: findings with confidence tags, sources as numbered references, gaps section
Citation hygiene: every URL must resolve; hallucinated citations are flagged as fatal errors in the handoff

Mission Templates

Five templates live in templates/, each targeting a different research task shape:

Template	Use Case
`MISSION-literature-survey.md`	Comprehensive landscape scan of a field or topic. Breadth-first, then synthesize. Good starting point for any new research area.
`MISSION-paper-deepdive.md`	Single paper analysis: methods, results, limitations, related work, open questions. Useful for important new preprints.
`MISSION-gap-analysis.md`	Find what's missing. Maps existing work, then explicitly identifies unsolved problems and underexplored directions.
`MISSION-systematic-review.md`	Structured evidence synthesis following PRISMA-adjacent methodology. Defines inclusion/exclusion criteria, records search protocol, grades evidence quality.
`MISSION-followup.md`	Investigate a specific paper or thread flagged by Watchtower. Pre-populated with the paper's metadata; agent fills in the analysis.

ncr research "topic" renders MISSION-literature-survey.md with the topic filled in. ncr deepdive "url" renders MISSION-paper-deepdive.md with paper metadata fetched from arXiv.

RESEARCH-IDEAS.md

A catalog of 15 autonomous research engine architectures, included as a reference and inspiration document. Covers approaches ranging from citation graph traversal to adversarial paper debate to continuous hypothesis refinement loops. Not active code -- a design document.

Configuration (`research/research-config.json`)

{
  "topics": ["cs.AI", "cs.MA", "cs.SE"],
  "keywords": ["autonomous agent", "agent orchestration", "agentic workflow", "multi-agent coordination", "LLM agent"],
  "negative_keywords": ["reinforcement learning", "autonomous driving", "traffic signal", "vehicle"],
  "semantic_scholar_api_key": "",
  "poll_interval_minutes": 60,
  "auto_launch": false,
  "min_citation_count": 0,
  "min_keyword_matches": 2,
  "max_papers_per_poll": 20,
  "relevance_threshold": 0.35,
  "vault_path": "",
  "output_dir": "research"
}

Field	Default	Description
`topics`	`["cs.AI"]`	arXiv category codes for RSS feeds. See arxiv.org/category_taxonomy.
`keywords`	`[]`	Keywords to match in titles and abstracts. Case-insensitive substring match.
`semantic_scholar_api_key`	`""`	Optional API key for higher rate limits (100 RPS vs 1 RPS unauthenticated). Free at semanticscholar.org/product/api.
`poll_interval_minutes`	`60`	How often Watchtower polls when running as a daemon.
`auto_launch`	`false`	Automatically start Nightcrawler when a paper above threshold is found.
`min_citation_count`	`0`	Filter out papers with fewer citations. Useful for excluding preprints in established fields.
`max_papers_per_poll`	`20`	Maximum papers to ingest per poll cycle, sorted by relevance score descending.
`min_keyword_matches`	`2`	Require at least N keyword matches before considering a paper. Prevents single-keyword noise.
`negative_keywords`	`[]`	Papers matching any of these keywords are instantly rejected, regardless of positive score.
`relevance_threshold`	`0.35`	Papers below this score are stored but not used to trigger missions. Range 0-1.
`vault_path`	`""`	Path to a VaultGraph vault for graph-aware relevance boosting. Leave empty to skip.
`output_dir`	`"research"`	Directory (relative to `~/.nightcrawler`) where research output and synthesis files are written.

Free APIs Used

No paid data sources. No Python frameworks. No vendor lock-in.

API	Coverage	Rate Limit	Cost
Semantic Scholar	225M papers, citation graph, recommendations	100 RPS with free API key; 1 RPS without	Free
OpenAlex	240M works, 50K added daily, full metadata	100K req/day without key; higher with email param	Free (CC0)
arXiv RSS	All categories, daily new papers	No limit	Free
arXiv Search API	Full-text search across all arXiv	3 req/sec suggested	Free

Semantic Scholar is the primary discovery engine. OpenAlex is used for cross-referencing and citation metadata. arXiv RSS is the fastest signal for brand-new preprints.

Directory Structure (Research)

The Research Toolkit adds these paths to the standard Nightcrawler layout:

~/.nightcrawler/
  research-rs/                 # Rust CLI (ncr binary)
    src/
      main.rs                  # CLI entry, 9 subcommands, mission gen
      watchtower.rs            # arXiv RSS + Search + S2 with retry/backoff
      synthesis.rs             # Finding extraction, contradiction detection
      paper.rs                 # Paper struct, scoring, JSONL I/O
      config.rs                # Config structs, directory helpers
    Cargo.toml

  research/
    watchtower.ts              # Paper monitor (TS, superseded by Rust)
    synthesis.ts               # Knowledge merger (TS, superseded by Rust)
    ncr.ts                     # CLI wrapper (TS, superseded by Rust)
    research-config.json       # Research toolkit configuration
    papers.jsonl               # All tracked papers with scores and metadata
    literature-review.json     # Structured findings (JSON)
    literature-review.md       # Running synthesis (grows across missions)

  skills/
    nightcrawler-episode.md    # Standard episode skill
    research-episode.md        # Enhanced episode skill for research missions

  templates/
    MISSION-research.md        # Quick-start research template
    MISSION-implementation.md  # Quick-start implementation template
    MISSION-literature-survey.md   # Comprehensive landscape scan
    MISSION-paper-deepdive.md      # Single paper analysis
    MISSION-gap-analysis.md        # Find what's missing in a field
    MISSION-systematic-review.md   # Structured evidence synthesis
    MISSION-followup.md            # Investigate watchtower-detected papers

  RESEARCH-IDEAS.md            # Catalog of 15 autonomous research engine architectures

Comparison with Existing Tools

The gap nobody fills: a system that connects paper monitoring to autonomous overnight research to accumulated knowledge synthesis, running on your machine, using free APIs, requiring no framework installation.

Tool	Stars	What it does	What it doesn't do
AI-Scientist (Sakana)	12.2k	Fully autonomous: hypothesis → experiment → paper	Doesn't monitor for new work; requires GPU; no episodic execution
SciAgents (MIT)	587	Multi-agent knowledge graph construction from papers	Can't run overnight autonomously; no paper monitoring
CrewAI	44.5k	General multi-agent framework with research roles	No paper monitoring; no synthesis accumulation; no crash recovery
Elicit	—	Literature review assistant, paper monitoring	Not autonomous; human-in-the-loop only
Ralph Loop	~11k	Continuous Claude Code loop	No research tooling; no structured handoffs; no paper monitoring
Nightcrawler Research	—	Watchtower detects papers → generates missions → executes overnight → synthesis accumulates	Doesn't write LaTeX or run experiments (yet)

The key architectural difference: Nightcrawler Research is a pipeline, not a monolith. Each piece (Watchtower, mission templates, episodic execution, synthesis) works standalone. Watchtower can run as a cron job without Nightcrawler. Synthesis can run on past research output from any source. The mission templates work with standard Nightcrawler. You compose them in whatever order fits your workflow.

Requirements

macOS (for launchd; the orchestrator itself is platform-agnostic)
Node.js 18+
Claude Code CLI (claude in PATH, typically at ~/.local/bin/claude)
Claude Max or API access with sufficient budget

Credits

Ralph Loop (ghuntley.com/ralph) -- the original in-session loop pattern. Nightcrawler's episodic approach was designed to solve the context degradation and crash recovery problems that emerge when running Ralph for hours.
Anthropic's Claude Code harness guide -- the session opening ritual, immutable JSON task tracking, and --dangerously-skip-permissions patterns come directly from Anthropic's recommendations for autonomous agent harnesses.
Architecture synthesis -- the episode/handoff/termination design emerged from a structured debate between Claude Opus and GLM-5 (Fireworks AI), combining ideas from distributed systems (checkpointing, circuit breakers) with agent-specific patterns (context window management, hallucination detection).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
missions		missions
research-rs		research-rs
research		research
skills		skills
templates		templates
.gitignore		.gitignore
DEV-TO-ARTICLE.md		DEV-TO-ARTICLE.md
DISTRIBUTION.md		DISTRIBUTION.md
LICENSE		LICENSE
README.md		README.md
RESEARCH-IDEAS.md		RESEARCH-IDEAS.md
com.user.nightcrawler.plist		com.user.nightcrawler.plist
config.example.json		config.example.json
nightcrawler.ts		nightcrawler.ts
package.json		package.json

License

thebasedcapital/nightcrawler

Folders and files

Latest commit

History

Repository files navigation

Nightcrawler

Why Not Just Loop Claude?

Architecture

Layer Stack

Data Flow

Directory Structure

Quick Start

1. Install

2. Write a Mission

3. Run

4. Monitor

5. Stop

6. Reset for Next Mission

Configuration

Termination Conditions

Immutable Task Tracking

Episode Lifecycle

Git Diff Verification

Session Opening Ritual

Mission Templates

Research Mission

Implementation Mission

launchd Service

Process Safety

Notifications

Design Decisions

Research Toolkit

Architecture

Quick Start (Research Mode)

Research CLI (ncr)

The Full Pipeline

Components

Watchtower (research/watchtower.ts)

Synthesis (research/synthesis.ts)

Research Episode Skill (skills/research-episode.md)

Mission Templates

RESEARCH-IDEAS.md

Configuration (research/research-config.json)

Free APIs Used

Directory Structure (Research)

Comparison with Existing Tools

Requirements

Credits

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Research CLI (`ncr`)

Watchtower (`research/watchtower.ts`)

Synthesis (`research/synthesis.ts`)

Research Episode Skill (`skills/research-episode.md`)

Configuration (`research/research-config.json`)

Packages