A small, security-focused CLI wrapper around Ollama for one job:
read local file evidence -> produce a constrained answer -> log everything needed to audit the run
This project is intentionally narrow. It is not a general autonomous agent framework. It is an evidence-gated runner that tries to make hallucination and unsafe file access expensive or impossible by default.
Quick operator runbook: OPERATOR_QUICKREF.md
- Operator quick reference
- What this project is and is not
- Why this exists
- End-to-end architecture
- Execution flow (
chatvsask) - Evidence and admissibility model
- Security model for file reads
- Model routing and token budget behavior
- Output quality controls (format + footer)
- Run logging and auditability
- Setup and quickstart
- CLI usage and recipes
- Phase 2 indexing and query
- Configuration reference
- Error codes and troubleshooting
- Testing and verification
- Extending safely
- Practical limitations
For day-to-day commands and failure triage, use:
What it is:
- A deterministic orchestrator around one model call (
chat) or two model calls (ask). - A strict tool-call protocol plus evidence validation.
- A sandboxed local file reader with typed failure modes.
- A reproducible run logger (
runs/<run_id>/run.json).
What it is not:
- Not LangChain, not a planner/executor loop, not an unbounded tool agent.
- Not a generic distributed retrieval framework; retrieval is local, deterministic, and evidence-first.
- Not a "trust the model by default" UX.
Common failure patterns in LLM + tool systems are well-known:
- The model claims it read a file that it never read.
- Tool-call JSON is malformed or mixed with prose and silently ignored.
- File access is too broad (path traversal, hidden files, absolute path reads).
- Partial reads are treated as full coverage.
- Documents contain prompt injection content and the model follows it.
local-agent turns these into explicit contracts:
- strict tool-call parsing
- fail-closed evidence gates
- sandboxed file access policy
- typed error codes
- auditable run logs with redaction
Core modules:
agent/__main__.py- CLI parsing (
chat,ask) - model selection (fast/big/default)
- ask state machine
- evidence validation and fail-closed behavior
- second-pass output checks and retry logic
- run logging
- CLI parsing (
agent/tools.pyToolSpec,ToolError,TOOLSread_text_fileimplementation- sandbox policy initialization and path validation
agent/protocol.py- strict + robust tool-call parsing
- supports prefix JSON tool-call extraction and trailing text capture
configs/default.yaml- model defaults, token/time budgets, security policy
tests/test_tools_security.py- sandbox and resolution behavior regression tests
SECURITY.md- manual security verification checklist
Single model call:
- Send user prompt.
- Print model response.
- Log sanitized raw response and metadata.
No tool use, no evidence gates.
Two-pass control flow with one model-requested tool call:
- Pass 1 (tool-selection prompt):
- Model must either:
- answer directly, or
- emit
{"type":"tool_call","name":"...","args":{...}}
- Model must either:
- Runner parses tool call:
- strict parse first
- prefix JSON parse fallback if response starts with tool-call JSON and contains trailing text
- If a tool call is emitted:
- execute tool
- validate evidence
- optional auto re-read for full-evidence questions when first read was truncated
- Pass 2 (answer-only prompt):
- tools forbidden
- output quality checks enforced
- If formatting violations:
- one retry with stricter prompt
- if still invalid -> typed failure
Important:
- If question semantics require file evidence and admissible evidence is not acquired, runner returns typed failure and does not ask model to guess.
- The runner may perform one additional
read_text_filecall itself for full-evidence rereads. This is not a model-requested second tool choice; it is runner-side evidence completion logic.
read_text_file evidence contract:
path(absolute)sha256(hash of full text)chars_full(full length)chars_returned(returned text length)truncated(bool)text(possibly truncated content)
Evidence is rejected when:
- required fields are missing
- field types are wrong
- char counts are inconsistent
- file is empty for summary-style tasks
- tool returned error
If evidence is invalid/missing when required:
- run fails closed
- returns typed JSON failure
- no second-pass "best effort" answer
Security policy is configured at startup from configs/default.yaml.
Controls:
- allowlisted roots (
allowed_roots) - allowlisted extensions (
allowed_exts) - deny absolute/anchored paths (
deny_absolute_paths) - deny hidden path segments (
deny_hidden_paths) - optional emergency bypass (
allow_any_path, default false) - root validation behavior (
auto_create_allowed_roots,roots_must_be_within_security_root)
Path request styles:
- Bare filename (no slash/backslash)
- Example:
note.md - Searched across allowlisted roots in order
- Exactly one match -> allowed
- Multiple matches ->
AMBIGUOUS_PATH - None ->
FILE_NOT_FOUND(if search path was valid but file missing)
- Explicit subpath (contains slash/backslash)
- Example:
allowed/corpus/project/note.md - Treated as
security_root-relative (same anchor asworkrootwhen configured) - Must still fall within an allowlisted root
Additional protections:
- lexical containment checks before existence checks
- strict resolve checks for existing paths (symlink escape defense)
- allowlisted roots are validated after
resolve(strict=True)when containment is enabled - extension and hidden-path checks before content read
Model selection supports default and split-model operation:
- Legacy/default: only
modelconfigured -> both passes usemodel - Split mode:
- pass 1 defaults to
model_fastwhenprefer_fastis true - pass 2 may upgrade to
model_bigwhen question matchesbig_triggers
- pass 1 defaults to
CLI overrides:
--fast: force fast model for both passes--big: force big model for answer pass--full: force full evidence read attempt when tool used
Budget controls:
max_tokensandtimeout_sbase valuesmax_tokens_big_secondandtimeout_s_big_secondfor large answer passmax_chars_full_readcap for runner-side rereads
Pass 2 includes explicit constraints:
- no tool calls
- no tool-call JSON envelopes
- no markdown tables (bullet/paragraph style preferred)
- no claims beyond provided evidence
- include canonical evidence scope footer
Validation checks:
- table detector heuristic
- tool-call detector on pass 2 output
- exact scope-footer last-line check
Retry behavior:
- one retry for format violations
- fast-path optimization: if only missing scope footer, append locally and skip retry
- if retry still violates format ->
SECOND_PASS_FORMAT_VIOLATION
Scope footer format:
Scope: full evidence from read_text_file (5159/5159), sha256=14e424b8f1f06f8c2e2f43867f52f37f6ffb95f8434f743f2a94f367a7d2c999
Each invocation writes:
runs/<run_id>/run.json
Key logged fields:
- run metadata (
mode, question/prompt, timings) - model selection (
raw_first_model,raw_second_model) - raw model responses with
message.thinkingstripped - tool trace
- evidence status (
required,status, truncation, char counts) - retry metadata (if used)
- final assistant text
Redaction rule:
- for file-read results, logs keep metadata +
text_previewonly (first 800 chars) - full file text is not logged by default
Requirements:
- Python 3.10+ (3.11 recommended)
- Ollama running locally (default
http://127.0.0.1:11434) - repo config available at
configs/default.yaml(always used; see Config location below) - default dependency set in
requirements.txtincludes Torch + embedding stack for Phase 3 torch-first operation
Install (editable):
python -m venv .venv
.\.venv\Scripts\activate
pip install -e .On Linux/macOS, use:
source .venv/bin/activate
pip install -e .Install from requirements (torch-first default environment):
pip install -r requirements.txtIf you want a lean environment without Torch, install only core dependencies explicitly instead of requirements.txt, for example:
pip install requests PyYAML
pip install -e .phase3.embed.provider: torch will fail unless Torch + embedding dependencies are installed.
Config location (important):
- Runtime always loads config from the repo file:
local-agent/configs/default.yaml. - Launch directory does not change which config file is selected.
- Root semantics:
config_rootcomes from the loaded config path,package_rootfrom installed code location, optionalworkrootcomes from--workroot/LOCAL_AGENT_WORKROOT/ configworkroot, andsecurity_rootis the path anchor used for tool security and run logs.
Split repo/workroot setup (no workroot config required):
- Keep your single live config in repo:
local-agent/configs/default.yaml. - Point
security.allowed_rootsat your sibling workroot data folders (already set in this repo):../local-agent-workroot/allowed/corpus/../local-agent-workroot/allowed/runs/../local-agent-workroot/allowed/scratch/
- Keep
security.roots_must_be_within_security_root: trueand setworkrootto the sibling data root (default in this repo:../local-agent-workroot/).
Ensure allowlisted dirs exist (or keep auto_create_allowed_roots: true):
allowed/
runs/
Smoke test:
.venv\\Scripts\\python -m agent chat "ping"
.venv\\Scripts\\python -m agent ask "Read allowed/corpus/secret.md and summarize it."
local-agent ask "Read allowed/corpus/secret.md and summarize it."
local-agent --workroot ../local-agent-workroot ask "Read allowed/corpus/secret.md and summarize it."Basic:
python -m agent chat "<prompt>"
python -m agent ask "<question>"
python -m agent doctor
python -m agent doctor --no-ollama
local-agent chat "<prompt>"
local-agent ask "<question>"
local-agent doctor
local-agent --workroot ../local-agent-workroot ask "<question>"ask flags:
--big--fast--full
Common patterns:
- Summarize a file in
allowed/corpus/:
python -m agent ask "Read allowed/corpus/test1a.md and summarize it in 5 bullets."- Disambiguate duplicate names:
python -m agent ask "Read allowed/corpus/test1a.md and summarize it."- Request high-depth synthesis:
python -m agent ask --big "Read allowed/corpus/test1a.md and give a thorough synthesis."Phase 2 introduces retrieval-ready markdown indexing with a "two sources, one index" model:
- sources are document categories (for example
corpusandscratch) - index is one unified SQLite DB containing documents, chunks, provenance, and typedness metadata
Important behavior:
askis now grounded by retrieval evidence (lexical + vector)- no vault note YAML is modified
- typed/untyped classification is stored in index metadata, not in note frontmatter
- missing metadata is explicit:
metadata=absentwhen frontmatter is missingmetadata=unknownwhen frontmatter exists but parse/typedness is indeterminate
Commands:
local-agent index
local-agent index --rebuild
local-agent query "coherence" --limit 5
local-agent embed --json
local-agent memory list --json
local-agent doctor
local-agent doctor --no-ollama
local-agent doctor --require-phase3 --jsonPhase 3 adds embeddings, retrieval fusion, and durable memory stores with explicit provenance invariants.
Phase 3 now defaults to phase3.embed.provider: torch.
Install optional embedding dependencies:
pip install -e ".[torch-embed]"No silent downloads are allowed during local-agent embed.
You must either:
- set
phase3.embed.torch.local_model_pathto a local model directory, or - pre-populate local cache and set
phase3.embed.torch.cache_dir.
If model files are unavailable locally, embed fails closed with PHASE3_EMBED_ERROR.
Embed corpus chunks from phase2 index:
local-agent embed [--model <id>] [--rebuild] [--batch-size N] [--limit N] [--dry-run] [--json]Doctor phase3 readiness (strict mode):
local-agent doctor --require-phase3 --jsonDurable memory commands:
local-agent memory add --type preference --source manual --content "..."
local-agent memory list --json
local-agent memory delete <memory_id>
local-agent memory export memory/export.jsonCitation hygiene option:
phase3.ask.citation_validation.require_in_snapshot: trueenforces that cited chunk keys must come from the retrieved evidence snapshot used for that run.- Recommended for fail-closed behavior: combine with
phase3.ask.citation_validation.strict: true. phase3.ask.citation_validation.heading_matchcontrols heading comparison (exact|prefix|ignore); defaultprefixavoids brittle failures when citations reference a parent heading.phase3.ask.citation_validation.normalize_heading: truenormalizes whitespace and trailing punctuation (for exampleH1: Freeform Journaling:andH1: Freeform Journaling).phase3.ask.evidence.top_ncontrols the snapshot/prompt evidence bandwidth (default8).- If strict snapshot checks are too tight, raise
top_nmodestly (for example8 -> 12or16); tradeoff is larger prompt and larger evidence logging payload before caps.
Top-level:
model,model_fast,model_bigprefer_fastbig_triggersmax_tokens,max_tokens_big_secondtimeout_s,timeout_s_big_secondread_full_on_thoroughmax_chars_full_readfull_evidence_triggerstemperatureollama_base_urlphase2(index_db_path,sources,chunking.max_chars,chunking.overlap)phase3embeddings_db_pathembedprovider(torchdefault,ollamaoptional)model_idpreprocess,chunk_preprocess_sig,query_preprocess_sigbatch_sizetorch.local_model_pathtorch.cache_dirtorch.device,torch.dtypetorch.batch_size,torch.max_lengthtorch.pooling,torch.normalizetorch.trust_remote_code,torch.offline_only
retrieve(lexical_k,vector_k,vector_fetch_k,rel_path_prefix,fusion)ask.evidence(top_n)ask.citation_validation(enabled,strict,require_in_snapshot,heading_match,normalize_heading)runs(log_evidence_excerpts,max_total_evidence_chars,max_excerpt_chars)memory(durable_db_path,enabled)
Security (security:):
allowed_rootsallowed_extsdeny_absolute_pathsdeny_hidden_pathsallow_any_pathauto_create_allowed_rootsroots_must_be_within_security_root
Current defaults in this repo are intentionally conservative:
- only
.md,.txt,.jsonreads - roots limited to configured
../local-agent-workroot/allowed/and../local-agent-workroot/runs/ - absolute/hidden path denial enabled
Typed failure format:
{"ok": false, "error_code": "...", "error_message": "..."}Frequent codes and first checks:
CONFIG_ERROR- verify
security.allowed_rootsresolve to valid directories
- verify
PATH_DENIED- check extension allowlist, hidden segments, traversal/absolute path use
FILE_NOT_FOUND- file not found under allowlisted roots
AMBIGUOUS_PATH- duplicate bare filename; use explicit subpath
EVIDENCE_NOT_ACQUIRED- model did not produce admissible tool call when evidence required
FILE_EMPTY- source file empty for summarize request
EVIDENCE_TRUNCATED- full evidence required but read remained partial
UNEXPECTED_TOOL_CALL_SECOND_PASS- model violated answer-only phase
SECOND_PASS_FORMAT_VIOLATION- output still violated format after one retry
DOCTOR_INDEX_DB_MISSING- preflight found no index DB at configured
phase2.index_db_path - run
python -m agent index --rebuild --json
- preflight found no index DB at configured
DOCTOR_CHUNKER_SIG_MISMATCH- preflight found stale chunking fingerprint vs configured phase2 chunking
- run
python -m agent index --scheme obsidian_v1 --rebuild --json(or your configured scheme)
DOCTOR_EMBED_OUTDATED_REQUIRE_PHASE3- preflight found embedding rows that do not match current phase3 model/preprocess/chunk hashes
- run
python -m agent embed --json(or--rebuild --json)
DOCTOR_EMBED_RUNTIME_FINGERPRINT_MISMATCH- embedding provider/runtime fingerprint changed since embeddings were written
- run
python -m agent embed --rebuild --json
DOCTOR_PHASE3_EMBEDDINGS_DB_MISSING- phase3-required preflight found no embeddings DB
- run
python -m agent embed --json
DOCTOR_MEMORY_DANGLING_EVIDENCE- durable memory references chunk keys that are no longer present in phase2 index
- delete or repair dangling memory records
DOCTOR_PHASE3_RETRIEVAL_NOT_READY- embeddings metadata looked valid but retrieval readiness smoke test failed
- verify embed provider runtime availability, then run
python -m agent embed --rebuild --jsonand re-run doctor
Debug tip:
- open latest
runs/<run_id>/run.json - inspect
resolved_config_path,config_root,package_root,workroot, andsecurity_rootfirst - inspect
tool_trace,evidence_status,raw_first,raw_second, and retry fields
Run unit tests:
python -m unittest discover -s tests -vCoverage includes:
- allowlisted read success
- explicit subpath success
- explicit subpath
security_rootanchoring (independent of process CWD) - ambiguous bare filename rejection
- extension and hidden path denial (including
.env) - traversal/absolute path denial
security_roottop-level file rejection when not allowlisted- fail-closed misconfiguration behavior
- symlink escape denial (POSIX test)
Manual security checklist:
- see
SECURITY.md
Doctor tip:
- use
python -m agent doctor --no-ollamato skip only Ollama network checks. - with
phase3.embed.provider: torch, retrieval smoke still runs under--no-ollama.
Create a clean, shareable zip (without .venv/, .git/, caches, or run logs):
python scripts/make_release_zip.py
python scripts/make_release_zip.py --dry-run
python scripts/make_release_zip.py --include-workroot--include-workroot adds only a curated subset (local-agent-workroot top-level boot/docs files plus allowed/.gitkeep and allowed/sample/** when present), and always excludes local-agent-workroot/runs/**.
Optional local cleanup helper:
python scripts/clean_artifacts.py --dry-run
python scripts/clean_artifacts.pyIf you add tools:
- Add new
ToolSpecinagent/tools.py. - Decide if output is admissible evidence.
- If admissible, add explicit validator in runner logic.
- Keep pass boundaries strict:
- pass 1: tool decision
- pass 2: answer only from provided tool output
- Add tests for security and contract behavior.
Intentional limits:
- single model-requested tool call per ask run
- bounded read/token budgets
- strict formatting and protocol checks can produce "hard fails" rather than graceful-but-risky answers
Non-goals:
- broad autonomous task execution
- unrestricted filesystem exploration
- hidden-file or arbitrary-extension access by default
This runner is built around three constraints:
- Finitude: bounded resources are explicit, not hidden.
- Integrity: only typed evidence is admissible for evidence-required asks.
- Scope discipline: partial coverage must be disclosed mechanically.
Mental model:
- a small "epistemic linter" around local-file Q&A, optimized for correctness and auditability over flexibility.