A Model Context Protocol server that provides AI assistants with direct access to local document collections through file-first search capabilities.
- Full-text search with boolean operators (AND, OR, NOT) and exact phrase matching
- Parallel search across multiple queries for faster results
- Fuzzy matching and context-aware result highlighting
- Powered by ugrep - No database or RAG infrastructure required
- Hierarchical collections - Organize knowledge using folder structures
- Scope control - Search globally, within collections, or in specific documents
- Smart discovery - Find documents by name or path patterns
- Multiple document formats - PDF, DOCX, HTML, JSON, XML, and more
- Automatic format detection - No manual configuration required
- Smart filter integration - Uses pandoc, jq, and other tools
- Graceful degradation - Formats auto-disabled if tools unavailable
- See docs/supported-formats.md for full details
- Read-only access - Server never modifies your documents
- Path validation - Prevents directory traversal attacks
- Command sandboxing - Filter commands run in restricted mode
- Whitelist enforcement - Shell filters validated before execution
This server requires the following system utilities:
# Ubuntu/Debian
sudo apt install ugrep poppler-utils
# macOS
brew install ugrep popplercontextfs supports searching and reading multiple document formats:
- Markdown (.md, .markdown)
- Plain Text (.txt, .rst)
- CSV (.csv)
- PDF (.pdf) - requires pdftotext (from poppler-utils)
- Microsoft Word (.doc, .docx) - requires pandoc or antiword
- OpenDocument (.odt) - requires pandoc
- EPUB (.epub) - requires pandoc
- HTML (.html, .htm) - requires pandoc
- RTF (.rtf) - requires pandoc
- JSON (.json) - requires jq
- XML (.xml) - requires pandoc
To enable all optional formats:
# macOS
brew install pandoc jq
# Linux (Ubuntu/Debian)
sudo apt install pandoc jq
# Windows (Chocolatey)
choco install pandoc jqSee docs/supported-formats.md for detailed installation instructions, configuration options, and troubleshooting.
The File Knowledge server organizes documents using a collection-based hierarchy that maps directly to your filesystem structure.
- A collection is simply a folder within your knowledge base root
- Collections can be nested to any depth
- Each document belongs to exactly one collection (its containing folder)
- The root directory itself is the top-level collection
The knowledge root can be specified via:
Command-line argument (recommended for static setups):
contextfs --root /path/to/documentsConfiguration file:
knowledge:
root: "/path/to/documents"Environment variable:
export CFS_KNOWLEDGE__ROOT=/path/to/documentsAll search operations support three scope levels:
- Global scope - Search across all documents in the knowledge base
- Collection scope - Limit search to a specific folder and its subfolders
- Document scope - Search within a single document only
This hierarchical approach enables efficient knowledge organization without requiring database infrastructure.
Create a config.yaml with your settings:
knowledge:
root: "./documents"
search:
context_lines: 5 # Lines of context around matches
max_results: 50 # Maximum results per search
timeout: 30 # Search timeout in seconds
security:
enable_shell_filters: true
filter_mode: whitelist # Recommended for production
exclude:
patterns:
- ".git/*"
- "*.draft.*"
- "*.tmp"See config.example.yaml for all available options.
All configuration options can be overridden using environment variables with the CFS_ prefix:
export CFS_KNOWLEDGE__ROOT=/path/to/documents
export CFS_SEARCH__MAX_RESULTS=100
export CFS_SECURITY__FILTER_MODE=whitelistUse double underscores (__) to denote nested configuration levels.
The server implements six MCP tools organized into three categories:
list_collections- List folders and documents in a collectionfind_document- Find documents by name or path pattern
search_documents- Full-text search with boolean operatorssearch_multiple- Execute multiple searches in parallel
read_document- Read document content with optional page selectionget_document_info- Get document metadata and table of contents
Browse the hierarchical structure of your knowledge base.
Arguments:
path(string, optional): Collection path relative to root. Defaults to root level.
Returns:
- List of subcollections (folders)
- List of documents with their paths and formats
Example:
{
"path": "programming/python"
}Locate documents by filename or path pattern using fuzzy matching.
Arguments:
query(string, required): Search term for document nameslimit(number, optional): Maximum results to return (default: 20)
Returns:
- List of matching documents with paths and relevance scores
Example:
{
"query": "async patterns",
"limit": 10
}Execute full-text searches across your knowledge base with powerful boolean operators.
Arguments:
query(string, required): Search query with optional operatorsscope(object, required): Defines search boundariestype(string): One of"global","collection", or"document"path(string, conditional): Required forcollectionanddocumentscopes
Search Operators:
term1 term2- AND: Find documents containing both termsterm1|term2- OR: Find documents containing either termterm1 -term2- NOT: Exclude documents with term2"exact phrase"- Match exact phrase with quotes
Returns:
- List of matches with document path, line numbers, and context
- Truncation indicator if results exceed maximum
Examples:
Global search:
{
"query": "authentication jwt",
"scope": {
"type": "global"
}
}Collection-scoped search:
{
"query": "async|await -deprecated",
"scope": {
"type": "collection",
"path": "programming/python"
}
}Document-specific search:
{
"query": "\"error handling\"",
"scope": {
"type": "document",
"path": "guides/best-practices.md"
}
}Execute multiple search queries concurrently for improved performance.
Arguments:
queries(array of strings, required): List of search queriesscope(object, required): Same scope structure assearch_documents
Returns:
- Object mapping each query to its search results
- Each result includes matches and truncation status
Example:
{
"queries": ["authentication", "authorization", "session management"],
"scope": {
"type": "collection",
"path": "security/docs"
}
}Note: Concurrent searches are limited by the limits.max_concurrent_searches configuration setting.
Read the complete contents of a document with optional page selection for PDFs.
Arguments:
path(string, required): Document path relative to knowledge rootpages(array of numbers, optional): Specific pages to read (PDF only)
Returns:
- Document content as text
- Format metadata
Examples:
Read entire document:
{
"path": "guides/user-manual.pdf"
}Read specific pages:
{
"path": "guides/user-manual.pdf",
"pages": [1, 5, 10]
}Note: Content length is limited by the limits.max_read_chars configuration setting.
Retrieve metadata and structural information about a document.
Arguments:
path(string, required): Document path relative to knowledge root
Returns:
- File size and format
- Page count (for PDFs)
- Table of contents with page numbers (when available)
- Last modified timestamp
Example:
{
"path": "reference/api-documentation.pdf"
}Add this to your claude_desktop_config.json:
{
"mcpServers": {
"contextfs": {
"command": "contextfs",
"args": ["--root", "/path/to/your/documents"]
}
}
}For more complex setups, use a configuration file:
{
"mcpServers": {
"contextfs": {
"command": "contextfs",
"args": ["--config", "/path/to/config.yaml"]
}
}
}When developing or running from source:
{
"mcpServers": {
"contextfs": {
"command": "uv",
"args": [
"--directory",
"/path/to/contextfs",
"run",
"contextfs",
"--root",
"/path/to/documents"
]
}
}
}- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%/Claude/claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
Important: Restart Claude Desktop after modifying the configuration file.
# Start the server
docker-compose up
# Build and start
docker-compose up --build
# Run in detached mode
docker-compose up -dThe included docker-compose.yaml provides:
- Read-only document mounting for security
- Resource limits (512MB memory, 1 CPU)
- Proper stdio configuration for MCP protocol
# Build image
docker build -t contextfs .
# Run with read-only mount
docker run -v /path/to/docs:/knowledge:ro contextfs
# Run with custom configuration
docker run \
-v /path/to/docs:/knowledge:ro \
-v /path/to/config.yaml:/config/config.yaml:ro \
contextfsThe File Knowledge server operates on local documents only. Cloud synchronization is intentionally handled outside the MCP server for security and architectural clarity.
Option 1: Cloud Desktop Clients
- Google Drive Desktop, Dropbox, OneDrive, iCloud Drive
- Automatic background sync to local folder
- Point server to synced directory
Option 2: rclone mount
# Mount cloud storage as read-only local directory
rclone mount gdrive:Knowledge /data/knowledge --read-only --vfs-cache-mode full --daemonOption 3: Scheduled sync
# Periodic sync via cron
*/30 * * * * rclone sync gdrive:Knowledge /data/knowledgeSee docs/cloud-sync-guide.md for detailed setup instructions.
# Clone repository
git clone https://github.com/RomanShnurov/ContextFS
cd contextfs
# Install with development dependencies (recommended)
uv sync --extra dev
# Alternative: pip
pip install -e ".[dev]"# Run all tests
uv run pytest
# Run with coverage report
uv run pytest --cov
# Run specific test file
uv run pytest tests/test_search.py
# Run with verbose output
uv run pytest -v# Format code
uv run ruff format .
# Lint code
uv run ruff check .
# Auto-fix linting issues
uv run ruff check . --fix
# Type checking
uv run mypy srcThe File Knowledge server implements defense-in-depth security:
- Path validation: All file paths validated against knowledge root
- Traversal prevention: Blocks
../and absolute path attacks - Symlink policy: Configurable symlink following (default: disabled)
- Whitelist enforcement: Filter commands validated before execution
- Sandboxed execution: Shell commands run with timeout limits
- Read-only design: Server never modifies document collection
- No credential access: Server never touches cloud storage APIs
security:
enable_shell_filters: true
filter_mode: whitelist # Recommended for production
allowed_filter_commands:
- "pdftotext - -"
symlink_policy: disallow # Prevent symlink attacksSince MCP servers run over stdio, debugging can be challenging. contextfs provides two options for interactive testing.
contextfs includes a Streamlit-based inspector UI for testing server tools, resources, and prompts:
# Install inspector dependencies
uv sync --extra inspector
# Run the inspector
streamlit run inspector/app.pyThe built-in Inspector provides:
- Interactive tool testing with dynamic forms
- Resource browsing and content reading
- Prompt listing and inspection
- Real-time server logs with filtering
- No external dependencies (Node.js not required)
Alternatively, use the official MCP Inspector:
npx @modelcontextprotocol/inspector contextfs --root /path/to/documentsYou can also use it with configuration files:
npx @modelcontextprotocol/inspector contextfs --config config.yamlContributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Run code quality checks (
ruff format,ruff check,mypy) - Submit a pull request
See CONTRIBUTING.md for detailed guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
- Model Context Protocol - Official MCP documentation
- MCP Specification - Protocol specification
- ugrep - Ultra-fast grep with boolean search
- poppler-utils - PDF rendering utilities