ContextFS

A Model Context Protocol server that provides AI assistants with direct access to local document collections through file-first search capabilities.

Features

Search Capabilities

Full-text search with boolean operators (AND, OR, NOT) and exact phrase matching
Parallel search across multiple queries for faster results
Fuzzy matching and context-aware result highlighting
Powered by ugrep - No database or RAG infrastructure required

Organization

Hierarchical collections - Organize knowledge using folder structures
Scope control - Search globally, within collections, or in specific documents
Smart discovery - Find documents by name or path patterns

Format Support

Multiple document formats - PDF, DOCX, HTML, JSON, XML, and more
Automatic format detection - No manual configuration required
Smart filter integration - Uses pandoc, jq, and other tools
Graceful degradation - Formats auto-disabled if tools unavailable
See docs/supported-formats.md for full details

Security

Read-only access - Server never modifies your documents
Path validation - Prevents directory traversal attacks
Command sandboxing - Filter commands run in restricted mode
Whitelist enforcement - Shell filters validated before execution

Installation

System Dependencies

This server requires the following system utilities:

# Ubuntu/Debian
sudo apt install ugrep poppler-utils

# macOS
brew install ugrep poppler

Supported Formats

contextfs supports searching and reading multiple document formats:

Default Formats (No Additional Tools Required)

Markdown (.md, .markdown)
Plain Text (.txt, .rst)
CSV (.csv)
PDF (.pdf) - requires pdftotext (from poppler-utils)

Optional Formats (Requires External Tools)

Microsoft Word (.doc, .docx) - requires pandoc or antiword
OpenDocument (.odt) - requires pandoc
EPUB (.epub) - requires pandoc
HTML (.html, .htm) - requires pandoc
RTF (.rtf) - requires pandoc
JSON (.json) - requires jq
XML (.xml) - requires pandoc

Quick Setup for Optional Formats

To enable all optional formats:

# macOS
brew install pandoc jq

# Linux (Ubuntu/Debian)
sudo apt install pandoc jq

# Windows (Chocolatey)
choco install pandoc jq

See docs/supported-formats.md for detailed installation instructions, configuration options, and troubleshooting.

Collections and Scope

The File Knowledge server organizes documents using a collection-based hierarchy that maps directly to your filesystem structure.

Understanding Collections

A collection is simply a folder within your knowledge base root
Collections can be nested to any depth
Each document belongs to exactly one collection (its containing folder)
The root directory itself is the top-level collection

Configuring the Knowledge Root

The knowledge root can be specified via:

Command-line argument (recommended for static setups):

contextfs --root /path/to/documents

Configuration file:

knowledge:
  root: "/path/to/documents"

Environment variable:

export CFS_KNOWLEDGE__ROOT=/path/to/documents

Search Scopes

All search operations support three scope levels:

Global scope - Search across all documents in the knowledge base
Collection scope - Limit search to a specific folder and its subfolders
Document scope - Search within a single document only

This hierarchical approach enables efficient knowledge organization without requiring database infrastructure.

Configuration

Basic Configuration

Create a config.yaml with your settings:

knowledge:
  root: "./documents"

search:
  context_lines: 5        # Lines of context around matches
  max_results: 50         # Maximum results per search
  timeout: 30             # Search timeout in seconds

security:
  enable_shell_filters: true
  filter_mode: whitelist  # Recommended for production

exclude:
  patterns:
    - ".git/*"
    - "*.draft.*"
    - "*.tmp"

See config.example.yaml for all available options.

Environment Variables

All configuration options can be overridden using environment variables with the CFS_ prefix:

export CFS_KNOWLEDGE__ROOT=/path/to/documents
export CFS_SEARCH__MAX_RESULTS=100
export CFS_SECURITY__FILTER_MODE=whitelist

Use double underscores (__) to denote nested configuration levels.

API

Tools

The server implements six MCP tools organized into three categories:

Browse Operations

list_collections - List folders and documents in a collection
find_document - Find documents by name or path pattern

Search Operations

search_documents - Full-text search with boolean operators
search_multiple - Execute multiple searches in parallel

Read Operations

read_document - Read document content with optional page selection
get_document_info - Get document metadata and table of contents

list_collections

Browse the hierarchical structure of your knowledge base.

Arguments:

path (string, optional): Collection path relative to root. Defaults to root level.

Returns:

List of subcollections (folders)
List of documents with their paths and formats

Example:

{
  "path": "programming/python"
}

find_document

Locate documents by filename or path pattern using fuzzy matching.

Arguments:

query (string, required): Search term for document names
limit (number, optional): Maximum results to return (default: 20)

Returns:

List of matching documents with paths and relevance scores

Example:

{
  "query": "async patterns",
  "limit": 10
}

search_documents

Execute full-text searches across your knowledge base with powerful boolean operators.

Arguments:

query (string, required): Search query with optional operators
scope (object, required): Defines search boundaries
- type (string): One of "global", "collection", or "document"
- path (string, conditional): Required for collection and document scopes

Search Operators:

term1 term2 - AND: Find documents containing both terms
term1|term2 - OR: Find documents containing either term
term1 -term2 - NOT: Exclude documents with term2
"exact phrase" - Match exact phrase with quotes

Returns:

List of matches with document path, line numbers, and context
Truncation indicator if results exceed maximum

Examples:

Global search:

{
  "query": "authentication jwt",
  "scope": {
    "type": "global"
  }
}

Collection-scoped search:

{
  "query": "async|await -deprecated",
  "scope": {
    "type": "collection",
    "path": "programming/python"
  }
}

Document-specific search:

{
  "query": "\"error handling\"",
  "scope": {
    "type": "document",
    "path": "guides/best-practices.md"
  }
}

search_multiple

Execute multiple search queries concurrently for improved performance.

Arguments:

queries (array of strings, required): List of search queries
scope (object, required): Same scope structure as search_documents

Returns:

Object mapping each query to its search results
Each result includes matches and truncation status

Example:

{
  "queries": ["authentication", "authorization", "session management"],
  "scope": {
    "type": "collection",
    "path": "security/docs"
  }
}

Note: Concurrent searches are limited by the limits.max_concurrent_searches configuration setting.

read_document

Read the complete contents of a document with optional page selection for PDFs.

Arguments:

path (string, required): Document path relative to knowledge root
pages (array of numbers, optional): Specific pages to read (PDF only)

Returns:

Document content as text
Format metadata

Examples:

Read entire document:

{
  "path": "guides/user-manual.pdf"
}

Read specific pages:

{
  "path": "guides/user-manual.pdf",
  "pages": [1, 5, 10]
}

Note: Content length is limited by the limits.max_read_chars configuration setting.

get_document_info

Retrieve metadata and structural information about a document.

Arguments:

path (string, required): Document path relative to knowledge root

Returns:

File size and format
Page count (for PDFs)
Table of contents with page numbers (when available)
Last modified timestamp

Example:

{
  "path": "reference/api-documentation.pdf"
}

Usage with Claude Desktop

Using Command-Line Arguments

Add this to your claude_desktop_config.json:

{
  "mcpServers": {
    "contextfs": {
      "command": "contextfs",
      "args": ["--root", "/path/to/your/documents"]
    }
  }
}

Using Configuration File

For more complex setups, use a configuration file:

{
  "mcpServers": {
    "contextfs": {
      "command": "contextfs",
      "args": ["--config", "/path/to/config.yaml"]
    }
  }
}

Using uv (Development)

When developing or running from source:

{
  "mcpServers": {
    "contextfs": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/contextfs",
        "run",
        "contextfs",
        "--root",
        "/path/to/documents"
      ]
    }
  }
}

Configuration File Location

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%/Claude/claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

Important: Restart Claude Desktop after modifying the configuration file.

Docker Deployment

Using docker-compose (Recommended)

# Start the server
docker-compose up

# Build and start
docker-compose up --build

# Run in detached mode
docker-compose up -d

The included docker-compose.yaml provides:

Read-only document mounting for security
Resource limits (512MB memory, 1 CPU)
Proper stdio configuration for MCP protocol

Manual Docker Build

# Build image
docker build -t contextfs .

# Run with read-only mount
docker run -v /path/to/docs:/knowledge:ro contextfs

# Run with custom configuration
docker run \
  -v /path/to/docs:/knowledge:ro \
  -v /path/to/config.yaml:/config/config.yaml:ro \
  contextfs

Cloud Storage Integration

The File Knowledge server operates on local documents only. Cloud synchronization is intentionally handled outside the MCP server for security and architectural clarity.

Recommended Approaches

Option 1: Cloud Desktop Clients

Google Drive Desktop, Dropbox, OneDrive, iCloud Drive
Automatic background sync to local folder
Point server to synced directory

Option 2: rclone mount

# Mount cloud storage as read-only local directory
rclone mount gdrive:Knowledge /data/knowledge --read-only --vfs-cache-mode full --daemon

Option 3: Scheduled sync

# Periodic sync via cron
*/30 * * * * rclone sync gdrive:Knowledge /data/knowledge

See docs/cloud-sync-guide.md for detailed setup instructions.

Development

Project Setup

# Clone repository
git clone https://github.com/RomanShnurov/ContextFS
cd contextfs

# Install with development dependencies (recommended)
uv sync --extra dev

# Alternative: pip
pip install -e ".[dev]"

Running Tests

# Run all tests
uv run pytest

# Run with coverage report
uv run pytest --cov

# Run specific test file
uv run pytest tests/test_search.py

# Run with verbose output
uv run pytest -v

Code Quality Tools

# Format code
uv run ruff format .

# Lint code
uv run ruff check .

# Auto-fix linting issues
uv run ruff check . --fix

# Type checking
uv run mypy src

Security

The File Knowledge server implements defense-in-depth security:

Path Security

Path validation: All file paths validated against knowledge root
Traversal prevention: Blocks ../ and absolute path attacks
Symlink policy: Configurable symlink following (default: disabled)

Command Security

Whitelist enforcement: Filter commands validated before execution
Sandboxed execution: Shell commands run with timeout limits
Read-only design: Server never modifies document collection
No credential access: Server never touches cloud storage APIs

Configuration

security:
  enable_shell_filters: true
  filter_mode: whitelist          # Recommended for production
  allowed_filter_commands:
    - "pdftotext - -"
  symlink_policy: disallow        # Prevent symlink attacks

Debugging

Since MCP servers run over stdio, debugging can be challenging. contextfs provides two options for interactive testing.

Built-in MCP Inspector (Recommended)

contextfs includes a Streamlit-based inspector UI for testing server tools, resources, and prompts:

# Install inspector dependencies
uv sync --extra inspector

# Run the inspector
streamlit run inspector/app.py

The built-in Inspector provides:

Interactive tool testing with dynamic forms
Resource browsing and content reading
Prompt listing and inspection
Real-time server logs with filtering
No external dependencies (Node.js not required)

External MCP Inspector

Alternatively, use the official MCP Inspector:

npx @modelcontextprotocol/inspector contextfs --root /path/to/documents

You can also use it with configuration files:

npx @modelcontextprotocol/inspector contextfs --config config.yaml

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Make your changes with tests
Run code quality checks (ruff format, ruff check, mypy)
Submit a pull request

See CONTRIBUTING.md for detailed guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Resources

Model Context Protocol - Official MCP documentation
MCP Specification - Protocol specification
ugrep - Ultra-fast grep with boolean search
poppler-utils - PDF rendering utilities

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github		.github
docs		docs
inspector		inspector
scripts		scripts
src/contextfs		src/contextfs
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.example.yaml		config.example.yaml
docker-compose.yaml		docker-compose.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

RomanShnurov/ContextFS

Folders and files

Latest commit

History

Repository files navigation

ContextFS

Features

Search Capabilities

Organization

Format Support

Security

Installation

System Dependencies

Supported Formats

Default Formats (No Additional Tools Required)

Optional Formats (Requires External Tools)

Quick Setup for Optional Formats

Collections and Scope

Understanding Collections

Configuring the Knowledge Root

Search Scopes

Configuration

Basic Configuration

Environment Variables

API

Tools

Browse Operations

Search Operations

Read Operations

list_collections

find_document

search_documents

search_multiple

read_document

get_document_info

Usage with Claude Desktop

Using Command-Line Arguments

Using Configuration File

Using uv (Development)

Configuration File Location

Docker Deployment

Using docker-compose (Recommended)

Manual Docker Build

Cloud Storage Integration

Recommended Approaches

Development

Project Setup

Running Tests

Code Quality Tools

Security

Path Security

Command Security

Configuration

Debugging

Built-in MCP Inspector (Recommended)

External MCP Inspector

Contributing

License

Resources

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages