This example was extracted from AGPA — my fully autonomous general-purpose agent (closed-source, ~150k LOC).
A local Retrieval-Augmented Generation (RAG) system for .NET that uses BERT embeddings and multiple search strategies for efficient semantic search and information retrieval.
LocalRAG provides a complete RAG implementation that runs entirely on your local machine, with no external API dependencies. It combines BERT-based embeddings with multiple search strategies to provide fast and accurate semantic search capabilities.
- BERT-based Text Embeddings: Uses ONNX Runtime for high-performance BERT inference
- Multiple Search Strategies:
- Locality-Sensitive Hashing (LSH) for efficient similarity search
- Full-Text Search (FTS5) integration via SQLite
- Memory-based vector indexing for real-time queries
- SQLite Database: Persistent storage for embeddings and metadata
- Configurable Processing: Adjustable chunking, overlap, and threading parameters
- Asynchronous API: Non-blocking operations for better performance
- Windows Forms Demo: Example application demonstrating usage
- .NET 10.0 SDK or later
- Windows, Linux, or macOS
- BERT ONNX model (see setup instructions below)
- BERT vocabulary file (vocab.txt)
- Clone the repository:
git clone https://github.com/johnbrodowski/LocalRAG.git
cd LocalRAG- Restore NuGet packages:
dotnet restore-
Download a BERT model in ONNX format:
- Visit Hugging Face ONNX Models
- Models under Apache 2.0; see Hugging Face for details
- Download a BERT model (e.g.,
bert-base-uncasedorbert-large-uncased) - Place the
.onnxfile in theonnxBERT/directory - Download the corresponding
vocab.txtfile - Place it in the
Vocabularies/directory
-
Build the project:
dotnet buildusing LocalRAG;
// Configure the RAG system
var config = new RAGConfiguration
{
ModelPath = "onnxBERT/model.onnx",
VocabularyPath = "Vocabularies/vocab.txt",
DatabasePath = "Database/embeddings.db"
};
// Initialize the database
using var database = new EmbeddingDatabaseNew(config);
// Add documents
await database.AddRequestToEmbeddingDatabaseAsync(
requestId: "doc1",
theRequest: "What is machine learning?",
embed: true
);
await database.UpdateTextResponse(
requestId: "doc1",
message: "Machine learning is a subset of artificial intelligence...",
embed: true
);
// Search for similar content
var results = await database.SearchEmbeddingsAsync(
searchText: "artificial intelligence",
topK: 5,
minimumSimilarity: 0.75f
);
foreach (var result in results)
{
Console.WriteLine($"Similarity: {result.Similarity:F3}");
Console.WriteLine($"Request: {result.Request}");
Console.WriteLine($"Response: {result.TextResponse}");
}The RAGConfiguration class provides various settings:
public class RAGConfiguration
{
// File paths
public string DatabasePath { get; set; } // SQLite database location
public string ModelPath { get; set; } // ONNX model file
public string VocabularyPath { get; set; } // BERT vocab file
// Embedding settings
public int MaxSequenceLength { get; set; } = 512;
public int WordsPerString { get; set; } = 40;
public double OverlapPercentage { get; set; } = 15;
// LSH settings
public int NumberOfHashFunctions { get; set; } = 8;
public int NumberOfHashTables { get; set; } = 10;
// Performance settings
public int InterOpNumThreads { get; set; } = 32;
public int IntraOpNumThreads { get; set; } = 2;
public int MaxCacheItems { get; set; } = 10000;
}- EmbedderClassNew: Handles BERT embeddings generation using ONNX Runtime
- EmbeddingDatabaseNew: Main database interface with SQLite storage
- MemoryHashIndex: In-memory hash-based indexing for fast lookups
- FeedbackDatabaseValues: Data model for stored documents and embeddings
- Text is preprocessed (tokenized, stop words removed)
- BERT generates embeddings via ONNX Runtime
- Embeddings are indexed using LSH for fast retrieval
- Multiple search strategies are combined for optimal results
- Results are ranked by similarity score
The DemoApp project provides a Windows Forms application demonstrating LocalRAG usage:
cd DemoApp
dotnet runThe demo shows:
- Adding documents with embeddings
- Searching for similar content
- Retrieving conversation history
- Formatting search results
- First Run: Initial embedding generation may be slow
- Caching: Frequently accessed embeddings are cached in memory
- Threading: Adjust
InterOpNumThreadsandIntraOpNumThreadsbased on your CPU - Database Size: SQLite performs well up to several million embeddings
LocalRAG includes comprehensive test coverage with both unit and integration tests.
-
One-time setup - Configure your test environment:
# Windows PowerShell copy test.runsettings.example test.runsettings # Linux/Mac cp test.runsettings.example test.runsettings
-
Edit
test.runsettingsand update these two paths:<BERT_MODEL_PATH>C:\path\to\your\model.onnx</BERT_MODEL_PATH> <BERT_VOCAB_PATH>C:\path\to\your\vocab.txt</BERT_VOCAB_PATH>
Important:
- Use absolute paths to the actual files (not directories)
- Point to the
.onnxfile itself (e.g.,model2.onnx) - Point to the
.txtvocab file (e.g.,base_cased_large.txt)
-
Run tests:
dotnet test --settings test.runsettings
Successful test run (all 38 tests passing):
Test summary: total: 38, failed: 0, succeeded: 38, skipped: 0, duration: 13.4s
Build succeeded in 15.3s
Fast tests that don't require BERT models. Run with:
dotnet test --filter "Category!=Integration"These test core functionality:
- Text preprocessing and tokenization
- Database operations
- Search algorithms (LSH, FTS)
- Configuration handling
Tests that require actual BERT models. Run with:
dotnet test --settings test.runsettings --filter "Category=Integration"These test:
- BERT embedding generation
- End-to-end search with real embeddings
- Model dimension validation (768 for base, 1024 for large)
- Semantic similarity calculations
- Mock data generation with embeddings
In Visual Studio, the tests will appear in Test Explorer. To run integration tests:
- Right-click on the solution in Solution Explorer
- Select "Configure Run Settings" → "Select Solution Wide runsettings File"
- Choose your
test.runsettingsfile - Run tests normally from Test Explorer
Test summary: total: 38, failed: 0, succeeded: 0, skipped: 38
Solution: You need to create test.runsettings and run with --settings test.runsettings
LocalRAG.Tests.SkipException : BERT model not configured.
Solutions:
- Verify
test.runsettingsexists in the project root - Check that paths point to files, not directories:
- ❌ Wrong:
C:\...\onnxBERT\ - ✅ Correct:
C:\...\onnxBERT\model2.onnx
- ❌ Wrong:
- Verify files exist at those paths:
# Windows dir "C:\path\to\model.onnx" dir "C:\path\to\vocab.txt" # Linux/Mac ls -la /path/to/model.onnx ls -la /path/to/vocab.txt
- Make sure you're running with:
dotnet test --settings test.runsettings
Ensure you've downloaded a BERT model:
- Visit Hugging Face ONNX Models
- Download a BERT model (e.g.,
bert-base-uncasedorbert-large-uncased) - Update
test.runsettingswith the actual file path
For continuous integration, set environment variables instead:
# Linux/Mac
export BERT_MODEL_PATH="/path/to/model.onnx"
export BERT_VOCAB_PATH="/path/to/vocab.txt"
dotnet test
# Windows
set BERT_MODEL_PATH=C:\path\to\model.onnx
set BERT_VOCAB_PATH=C:\path\to\vocab.txt
dotnet testOr skip integration tests in CI:
dotnet test --filter "Category!=Integration"Ensure the ONNX model file exists at the configured ModelPath. Download from Hugging Face if needed.
Reduce MaxCacheItems or MaxSequenceLength in configuration.
- Use a smaller BERT model (base vs. large)
- Increase thread count if you have more CPU cores
- Enable GPU support via ONNX Runtime GPU packages
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the Apache License 2.0 - see LICENSE.txt for details.
- Built with ONNX Runtime
- Uses FastBertTokenizer for tokenization
- BERT models from Hugging Face
- GPU acceleration support
- More embedding models (Sentence Transformers, etc.)
- Vector database integration options
- REST API interface
- Multi-language support
For questions and issues, please open an issue on GitHub