A fast, lightweight local full-text search engine for indexing and searching through your files (Markdown, text, and code). Built entirely in Go with inverted index data structures and persistent storage using BoltDB.
- 🚀 Fast Indexing: Efficiently indexes files using inverted index data structures
- 🔍 Instant Search: Sub-second search across thousands of files
- 🎯 TF-IDF Ranking: Intelligent relevance scoring using Term Frequency-Inverse Document Frequency
- 🔤 Fuzzy Matching: Find results even with typos using Levenshtein distance
- 📊 Incremental Indexing: Only re-indexes modified files
- 💾 Persistent Storage: Indexes stored on disk using BoltDB
- 🖥️ Dual Interface: CLI and HTTP API server
- 📝 Multiple File Types: Supports Markdown, text, and code files (.md, .txt, .go, .py, .js, .ts, .java, .c, .cpp, .rs, etc.)
- 🎨 Colored CLI Output: Beautiful, colorized terminal output
- Go 1.18 or higher
git clone https://github.com/BaseMax/go-local-search.git
cd go-local-search
go build -o bin/search ./cmd/searchgo install github.com/BaseMax/go-local-search/cmd/search@latestIndex all files in a directory:
search index /path/to/directoryExample:
search index ~/Documents
search index ~/projectsBasic search:
search search "your query"Fuzzy search (tolerates typos):
search search "your query" --fuzzyFuzzy search with custom edit distance:
search search "your query" --fuzzy --distance 1Examples:
# Search for exact matches
search search "golang programming"
# Fuzzy search (finds "python" even if you type "pythn")
search search "pythn" --fuzzy
# Search code with function names
search search "handleRequest" --fuzzy --distance 2Start the HTTP API server:
search server [address]Examples:
# Start on default port (localhost:8080)
search server
# Start on custom port
search server localhost:3000Show index statistics:
search statsOutput:
=== Index Statistics ===
Documents: 1234
Terms: 5678
Files: 1234
Total Size: 45.67 MB
GET /search - Search indexed files
Parameters:
q(required) - Search queryfuzzy(optional) - Enable fuzzy matching (true/false)distance(optional) - Max edit distance for fuzzy search (default: 2)
Example:
curl "http://localhost:8080/search?q=golang&fuzzy=true"Response:
{
"query": "golang",
"count": 2,
"results": [
{
"path": "/path/to/file.md",
"score": 1.386,
"match_count": 1,
"snippet": "Go is a programming language..."
}
]
}POST /index - Index a directory
Body:
{
"path": "/path/to/directory"
}Example:
curl -X POST http://localhost:8080/index \
-H "Content-Type: application/json" \
-d '{"path": "/home/user/documents"}'Response:
{
"success": true,
"files_indexed": 42
}GET /stats - Get index statistics
Example:
curl http://localhost:8080/statsResponse:
{
"document_count": 1234,
"term_count": 5678,
"files_indexed": 1234,
"total_size": 47890123
}-
Tokenizer (
internal/tokenizer):- Text tokenization and normalization
- Stop word filtering
- Basic stemming
- Levenshtein distance calculation for fuzzy matching
-
Inverted Index (
internal/index):- Efficient inverted index data structure
- TF-IDF scoring for relevance ranking
- Positional information tracking
- Thread-safe operations
-
Storage (
internal/storage):- BoltDB integration for persistent storage
- Index serialization/deserialization
- Metadata storage
-
Indexer (
internal/indexer):- Recursive directory scanning
- File type detection
- Incremental indexing (detects file changes)
- SHA-256 hashing for change detection
-
Search Engine (
internal/search):- Main search engine orchestration
- Query processing
- Result ranking
- Fuzzy search implementation
-
HTTP Server (
internal/server):- RESTful API endpoints
- JSON request/response handling
- Web-based interface
-
Indexing Phase:
- Files are scanned recursively
- Content is tokenized into terms
- Terms are normalized (lowercase, stemming)
- Inverted index is built: term → list of (document, frequency, positions)
- Index is persisted to BoltDB
-
Search Phase:
- Query is tokenized and normalized
- Relevant documents are retrieved from inverted index
- TF-IDF scoring calculates relevance
- Results are ranked by score and number of matching terms
- For fuzzy search, similar terms are found using Levenshtein distance
-
Incremental Indexing:
- File modification times and hashes are tracked
- Only changed files are re-indexed
- Removed files are automatically cleaned from index
- Markdown: .md
- Text: .txt
- Go: .go
- Python: .py
- JavaScript: .js, .ts
- Java: .java
- C/C++: .c, .cpp, .h
- Rust: .rs
- Ruby: .rb
- PHP: .php
- Shell: .sh
- YAML: .yml, .yaml
- JSON: .json
- XML: .xml
- HTML: .html
- CSS: .css
- SQL: .sql
- README files (no extension)
The search engine uses TF-IDF (Term Frequency-Inverse Document Frequency) for ranking:
- TF (Term Frequency): Number of times a term appears in a document
- IDF (Inverse Document Frequency): log(total_documents / documents_containing_term)
- Score: TF × IDF
Documents with higher scores are more relevant to the query.
Fuzzy search uses Levenshtein distance to find similar terms:
- Default maximum edit distance: 2
- Finds terms within the specified edit distance
- Useful for handling typos and variations
Configuration is stored in ~/.go-local-search/config.json:
{
"storage_path": "/home/user/.go-local-search/index.db",
"index_paths": [],
"server_addr": "localhost:8080",
"fuzzy_search": false,
"max_distance": 2
}- Indexing speed: ~1000 files/second (depends on file size and disk speed)
- Search speed: Sub-millisecond for most queries
- Memory usage: Efficient with lazy loading from BoltDB
- Disk usage: Index size is typically 10-20% of original file size
search index ~/projects
search index ~/Documents
search index ~/notes# Find Go tutorials
search search "golang tutorial"
# Find function definitions (with fuzzy matching)
search search "handleRequest" --fuzzy
# Search for algorithms
search search "binary search algorithm"
# Find Python code
search search "python class definition"# Start the server
search server localhost:8080
# Search via API
curl "http://localhost:8080/search?q=golang&fuzzy=true" | jq
# Index new directory via API
curl -X POST http://localhost:8080/index \
-H "Content-Type: application/json" \
-d '{"path": "/home/user/new-project"}' | jq
# Get statistics
curl http://localhost:8080/stats | jq- BoltDB - Embedded key/value database for persistent storage
.
├── cmd/
│ └── search/ # Main CLI application
├── internal/
│ ├── tokenizer/ # Text tokenization and normalization
│ ├── index/ # Inverted index implementation
│ ├── storage/ # BoltDB storage layer
│ ├── indexer/ # File indexing logic
│ ├── search/ # Search engine
│ └── server/ # HTTP API server
├── pkg/
│ └── config/ # Configuration management
└── bin/ # Compiled binaries
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the GPL-3.0 License - see the LICENSE file for details.
- Max Base - @BaseMax
- Built with Go
- Uses BoltDB for efficient storage
- Inspired by modern search engines and information retrieval techniques