Skip to content

space-rf-org/db25-sql-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DB25 SQL Parser

C++23 CMake SIMD License

A high-performance, SIMD-optimized SQL parser with comprehensive AST generation

Features β€’ Quick Start β€’ Documentation β€’ Performance β€’ Contributing

Features

  • πŸš€ Blazing Fast: SIMD-optimized tokenizer with 4.5x speedup
  • 🎯 Comprehensive: Supports modern SQL including CTEs, window functions, and more
  • πŸ›‘οΈ Secure: SQLite-inspired depth protection against stack overflow
  • πŸ”§ Extensible: Clean architecture for easy feature additions
  • πŸ“Š Visual AST: Professional AST viewer with colored tree output
  • πŸ’Ύ Zero-Copy: Efficient memory usage with string_view throughout
  • πŸ—οΈ Modern C++23: Leverages latest language features

Quick Start

Prerequisites

  • C++23 compatible compiler:
    • Apple Clang 16.0+ (for deducing this support)
    • GCC 13+
    • Clang 16+
    • MSVC 2022+ (v19.32+)
  • CMake 3.20 or higher
  • SIMD support (SSE4.2 minimum, AVX2 recommended)

Build and Install

# Clone the repository
git clone https://github.com/space-rf-org/db25-sql-parser.git
cd db25-sql-parser

# Build (clean build from scratch)
mkdir -p build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j$(nproc)

# Run tests
make test

# Generate visual AST dumps
make generate_ast_dumps

# Install (optional)
sudo make install

Basic Usage

#include <db25/parser/parser.hpp>
#include <iostream>

int main() {
    db25::parser::Parser parser;
    
    auto result = parser.parse("SELECT * FROM users WHERE age > 18");
    
    if (result) {
        std::cout << "Parse successful!" << std::endl;
        // Work with AST...
    } else {
        std::cerr << "Parse error: " << result.error().message << std::endl;
    }
    
    return 0;
}

AST Viewer

Visualize your SQL queries as beautiful syntax trees:

# From command line
echo "SELECT * FROM users" | ./ast_viewer

# From file
./ast_viewer --file tests/showcase_queries.sqls --index 1

# With statistics
./ast_viewer --stats --file queries.sqls --all
Example AST Output (click to expand)
╔══════════════════════════════════════════════════════════════╗
β•‘          DB25 SQL Parser - AST Viewer v1.0                  β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

SQL: SELECT name, RANK() OVER (PARTITION BY dept ORDER BY salary DESC) FROM employees

Abstract Syntax Tree:
──────────────────────────────────────────────────────────────
└─ SELECT [#1, 2 children]
   β”œβ”€ SELECT LIST [#3, 2 children]
   β”‚  β”œβ”€ COLUMN: name [#2]
   β”‚  └─ WINDOW FUNC: RANK [#4, 1 children]
   β”‚     └─ WINDOW [#5, 2 children]
   β”‚        β”œβ”€ PARTITION BY [#6, 1 children]
   β”‚        β”‚  └─ COLUMN: dept [#7]
   β”‚        └─ ORDER BY [#9, 1 children]
   β”‚           └─ COLUMN: salary [#8, DESC]
   └─ FROM [#11, 1 children]
      └─ TABLE: employees [#10]
──────────────────────────────────────────────────────────────

Supported SQL Features

βœ… Data Query Language (DQL)

  • SELECT with all standard clauses
  • JOINs (INNER, LEFT, RIGHT, FULL, CROSS, LATERAL)
  • Subqueries (scalar, IN, EXISTS, correlated)
  • CTEs including recursive
  • Window functions with frames
  • Set operations (UNION, INTERSECT, EXCEPT)
  • GROUPING SETS, CUBE, ROLLUP

βœ… Data Manipulation Language (DML)

  • INSERT (VALUES, SELECT, DEFAULT VALUES)
  • UPDATE with complex expressions
  • DELETE with USING clause
  • RETURNING clause

βœ… Data Definition Language (DDL)

  • CREATE TABLE/INDEX/VIEW
  • ALTER TABLE
  • DROP with CASCADE

βœ… Advanced Features

  • CASE expressions
  • CAST operations
  • Complex expressions with precedence
  • String, date, and math functions
  • Aggregate functions

Performance

Component Performance Notes
Tokenizer 4.5x faster SIMD-optimized
Parser ~100K queries/sec O(1) dispatch
Memory Zero fragmentation Arena allocator
Depth 1001 levels Stack protection

Documentation

Project Structure

db25-sql-parser/
β”œβ”€β”€ include/db25/         # Public headers
β”‚   β”œβ”€β”€ ast/             # AST node definitions
β”‚   β”œβ”€β”€ parser/          # Parser interface
β”‚   └── memory/          # Memory management
β”œβ”€β”€ src/                 # Implementation
β”‚   β”œβ”€β”€ parser/          # Parser implementation
β”‚   β”œβ”€β”€ ast/             # AST utilities
β”‚   └── memory/          # Arena allocator
β”œβ”€β”€ external/            # Submodules
β”‚   └── tokenizer/       # SIMD tokenizer (protected)
β”œβ”€β”€ tests/               # Test suites & test data
β”‚   β”œβ”€β”€ showcase_queries.sqls  # Example queries
β”‚   β”œβ”€β”€ test_*.cpp       # Unit tests
β”‚   └── test_all_queries.sh    # Batch testing script
β”œβ”€β”€ tools/               # Tool source code
β”‚   └── ast_viewer.cpp   # AST visualization tool
β”œβ”€β”€ bin/                 # Compiled executables
β”‚   └── ast_viewer       # Ready-to-use tools
β”œβ”€β”€ scripts/             # Utility scripts
β”‚   └── test_all_queries.sh  # Batch testing (copy)
β”œβ”€β”€ docs/                # Documentation
└── build/               # Build output (git-ignored)

Building and Testing

Clean Build from Scratch

# Remove any existing build artifacts
rm -rf build

# Create fresh build directory
mkdir build && cd build

# Configure with all features
cmake -DCMAKE_BUILD_TYPE=Release \
      -DBUILD_TESTS=ON \
      -DBUILD_TOOLS=ON \
      ..

# Build everything
make -j$(nproc)

# Run all tests
make test

# Or run tests individually
./test_parser_basic
./test_join_comprehensive
./test_window_functions
./test_cte

# Generate AST dumps for visualization
make generate_ast_dumps

# View generated dumps
ls /tmp/db25_ast_dumps/

Debug Build

cmake -DCMAKE_BUILD_TYPE=Debug -DBUILD_TESTS=ON ..
make -j$(nproc)

Contributing

We welcome contributions! Please see our Developer Guide for details on:

  • Architecture overview
  • Adding new features
  • Testing guidelines
  • Code style
  • Performance optimization

Quick Contribution Guide

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Write tests first (TDD approach)
  4. Implement your feature
  5. Ensure all tests pass (make test)
  6. Update documentation
  7. Submit a Pull Request

Performance Benchmarks

Benchmark Results (click to expand)
System: Apple M1 Pro, 32GB RAM
Compiler: Apple Clang 17.0.0
Build: Release with -O3 -march=native

Tokenization Performance:
- Scalar:    220,000 tokens/sec
- SSE4.2:    550,000 tokens/sec
- AVX2:      880,000 tokens/sec
- AVX512:    990,000 tokens/sec

Parse Performance:
- Simple SELECT:     1.2 ΞΌs
- Complex JOIN:      8.5 ΞΌs
- Recursive CTE:     15.3 ΞΌs
- Window Functions:  12.7 ΞΌs

Memory Usage:
- Node Size:         64 bytes
- Arena Block:       4 MB
- Fragmentation:     0%

Architecture Highlights

  • Zero-Copy Design: String views throughout for minimal allocations
  • SIMD Tokenizer: Runtime CPU detection for optimal performance
  • Arena Allocation: Fast, cache-friendly memory management
  • Depth Protection: Safe recursive descent with stack guards
  • Modern C++23: Concepts, ranges, and advanced template features

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Tokenizer SIMD optimizations inspired by simdjson
  • Parser architecture influenced by PostgreSQL and DuckDB
  • Arena allocator based on game engine techniques
  • Depth guard pattern from SQLite

Support


Built with ❀️ using modern C++23

Back to Top

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published