Skip to content

A fast, lightweight local full-text search engine for indexing and searching through your files (Markdown, text, and code). Built entirely in Go with inverted index data structures and persistent storage using BoltDB. A local full-text search engine for files and notes.

License

Notifications You must be signed in to change notification settings

BaseMax/go-local-search

Repository files navigation

Go Local Search

A fast, lightweight local full-text search engine for indexing and searching through your files (Markdown, text, and code). Built entirely in Go with inverted index data structures and persistent storage using BoltDB.

Features

  • 🚀 Fast Indexing: Efficiently indexes files using inverted index data structures
  • 🔍 Instant Search: Sub-second search across thousands of files
  • 🎯 TF-IDF Ranking: Intelligent relevance scoring using Term Frequency-Inverse Document Frequency
  • 🔤 Fuzzy Matching: Find results even with typos using Levenshtein distance
  • 📊 Incremental Indexing: Only re-indexes modified files
  • 💾 Persistent Storage: Indexes stored on disk using BoltDB
  • 🖥️ Dual Interface: CLI and HTTP API server
  • 📝 Multiple File Types: Supports Markdown, text, and code files (.md, .txt, .go, .py, .js, .ts, .java, .c, .cpp, .rs, etc.)
  • 🎨 Colored CLI Output: Beautiful, colorized terminal output

Installation

Prerequisites

  • Go 1.18 or higher

Build from Source

git clone https://github.com/BaseMax/go-local-search.git
cd go-local-search
go build -o bin/search ./cmd/search

Install Globally

go install github.com/BaseMax/go-local-search/cmd/search@latest

Usage

CLI Commands

Index Files

Index all files in a directory:

search index /path/to/directory

Example:

search index ~/Documents
search index ~/projects

Search

Basic search:

search search "your query"

Fuzzy search (tolerates typos):

search search "your query" --fuzzy

Fuzzy search with custom edit distance:

search search "your query" --fuzzy --distance 1

Examples:

# Search for exact matches
search search "golang programming"

# Fuzzy search (finds "python" even if you type "pythn")
search search "pythn" --fuzzy

# Search code with function names
search search "handleRequest" --fuzzy --distance 2

Start HTTP Server

Start the HTTP API server:

search server [address]

Examples:

# Start on default port (localhost:8080)
search server

# Start on custom port
search server localhost:3000

View Statistics

Show index statistics:

search stats

Output:

=== Index Statistics ===
Documents:    1234
Terms:        5678
Files:        1234
Total Size:   45.67 MB

HTTP API

Endpoints

GET /search - Search indexed files

Parameters:

  • q (required) - Search query
  • fuzzy (optional) - Enable fuzzy matching (true/false)
  • distance (optional) - Max edit distance for fuzzy search (default: 2)

Example:

curl "http://localhost:8080/search?q=golang&fuzzy=true"

Response:

{
  "query": "golang",
  "count": 2,
  "results": [
    {
      "path": "/path/to/file.md",
      "score": 1.386,
      "match_count": 1,
      "snippet": "Go is a programming language..."
    }
  ]
}

POST /index - Index a directory

Body:

{
  "path": "/path/to/directory"
}

Example:

curl -X POST http://localhost:8080/index \
  -H "Content-Type: application/json" \
  -d '{"path": "/home/user/documents"}'

Response:

{
  "success": true,
  "files_indexed": 42
}

GET /stats - Get index statistics

Example:

curl http://localhost:8080/stats

Response:

{
  "document_count": 1234,
  "term_count": 5678,
  "files_indexed": 1234,
  "total_size": 47890123
}

Architecture

Components

  1. Tokenizer (internal/tokenizer):

    • Text tokenization and normalization
    • Stop word filtering
    • Basic stemming
    • Levenshtein distance calculation for fuzzy matching
  2. Inverted Index (internal/index):

    • Efficient inverted index data structure
    • TF-IDF scoring for relevance ranking
    • Positional information tracking
    • Thread-safe operations
  3. Storage (internal/storage):

    • BoltDB integration for persistent storage
    • Index serialization/deserialization
    • Metadata storage
  4. Indexer (internal/indexer):

    • Recursive directory scanning
    • File type detection
    • Incremental indexing (detects file changes)
    • SHA-256 hashing for change detection
  5. Search Engine (internal/search):

    • Main search engine orchestration
    • Query processing
    • Result ranking
    • Fuzzy search implementation
  6. HTTP Server (internal/server):

    • RESTful API endpoints
    • JSON request/response handling
    • Web-based interface

How It Works

  1. Indexing Phase:

    • Files are scanned recursively
    • Content is tokenized into terms
    • Terms are normalized (lowercase, stemming)
    • Inverted index is built: term → list of (document, frequency, positions)
    • Index is persisted to BoltDB
  2. Search Phase:

    • Query is tokenized and normalized
    • Relevant documents are retrieved from inverted index
    • TF-IDF scoring calculates relevance
    • Results are ranked by score and number of matching terms
    • For fuzzy search, similar terms are found using Levenshtein distance
  3. Incremental Indexing:

    • File modification times and hashes are tracked
    • Only changed files are re-indexed
    • Removed files are automatically cleaned from index

Technical Details

Supported File Types

  • Markdown: .md
  • Text: .txt
  • Go: .go
  • Python: .py
  • JavaScript: .js, .ts
  • Java: .java
  • C/C++: .c, .cpp, .h
  • Rust: .rs
  • Ruby: .rb
  • PHP: .php
  • Shell: .sh
  • YAML: .yml, .yaml
  • JSON: .json
  • XML: .xml
  • HTML: .html
  • CSS: .css
  • SQL: .sql
  • README files (no extension)

TF-IDF Scoring

The search engine uses TF-IDF (Term Frequency-Inverse Document Frequency) for ranking:

  • TF (Term Frequency): Number of times a term appears in a document
  • IDF (Inverse Document Frequency): log(total_documents / documents_containing_term)
  • Score: TF × IDF

Documents with higher scores are more relevant to the query.

Fuzzy Matching

Fuzzy search uses Levenshtein distance to find similar terms:

  • Default maximum edit distance: 2
  • Finds terms within the specified edit distance
  • Useful for handling typos and variations

Configuration

Configuration is stored in ~/.go-local-search/config.json:

{
  "storage_path": "/home/user/.go-local-search/index.db",
  "index_paths": [],
  "server_addr": "localhost:8080",
  "fuzzy_search": false,
  "max_distance": 2
}

Performance

  • Indexing speed: ~1000 files/second (depends on file size and disk speed)
  • Search speed: Sub-millisecond for most queries
  • Memory usage: Efficient with lazy loading from BoltDB
  • Disk usage: Index size is typically 10-20% of original file size

Examples

Index your projects

search index ~/projects
search index ~/Documents
search index ~/notes

Search examples

# Find Go tutorials
search search "golang tutorial"

# Find function definitions (with fuzzy matching)
search search "handleRequest" --fuzzy

# Search for algorithms
search search "binary search algorithm"

# Find Python code
search search "python class definition"

Using the HTTP API

# Start the server
search server localhost:8080

# Search via API
curl "http://localhost:8080/search?q=golang&fuzzy=true" | jq

# Index new directory via API
curl -X POST http://localhost:8080/index \
  -H "Content-Type: application/json" \
  -d '{"path": "/home/user/new-project"}' | jq

# Get statistics
curl http://localhost:8080/stats | jq

Dependencies

  • BoltDB - Embedded key/value database for persistent storage

Project Structure

.
├── cmd/
│   └── search/          # Main CLI application
├── internal/
│   ├── tokenizer/       # Text tokenization and normalization
│   ├── index/           # Inverted index implementation
│   ├── storage/         # BoltDB storage layer
│   ├── indexer/         # File indexing logic
│   ├── search/          # Search engine
│   └── server/          # HTTP API server
├── pkg/
│   └── config/          # Configuration management
└── bin/                 # Compiled binaries

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the GPL-3.0 License - see the LICENSE file for details.

Author

Acknowledgments

  • Built with Go
  • Uses BoltDB for efficient storage
  • Inspired by modern search engines and information retrieval techniques

About

A fast, lightweight local full-text search engine for indexing and searching through your files (Markdown, text, and code). Built entirely in Go with inverted index data structures and persistent storage using BoltDB. A local full-text search engine for files and notes.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published