Migrate from CozoDB to SurrealDB by CamonZ · Pull Request #12 · CamonZ/code_search

CamonZ · 2025-12-27T23:25:08Z

Summary

Complete migration from CozoDB to SurrealDB as the database backend. This was done in two phases:

Add SurrealDB backend - Implemented SurrealDB alongside CozoDB with feature flags
Remove CozoDB - Standardized on SurrealDB, removing ~12,000 lines of duplicate code

Key Changes

Database Backend

SurrealDB with RocksDB storage (file: surrealdb.rocksdb)
6 node tables: modules, functions, types, structs, files, specs
4 relationship tables: calls, accepts, returns, type_references
Graph traversal using SurrealQL's -> and <- operators
Natural key IDs (e.g., functions:[$module, $name, $arity])

Architecture Simplification

Removed db/src/backend/cozo.rs and cozo_schema.rs (473 lines)
Removed all #[cfg(feature = "backend-cozo")] conditional blocks
Removed CozoDB dependencies from Cargo.toml
Simplified Database trait to single implementation
Default database filename changed from cozo.sqlite to surrealdb.rocksdb

Query Modules

All 31 query modules use SurrealDB:

accepts, calls, calls_from, calls_to, clusters, complexity, cycles
depended_by, dependencies, depends_on, duplicates, file, function
hotspots, import, large_functions, location, many_clauses, path
returns, reverse_trace, schema, search, specs, struct_usage
structs, trace, types, unused

Testing

576 database tests - Comprehensive SurrealDB coverage
484 CLI tests - Command execution and output formatting
18 acceptance tests - End-to-end CLI testing with assert_cmd
Complex test fixture with realistic call graph data including cycles

Documentation Updated

README.md, CLAUDE.md, docs/*.md
Templates and examples
Git hooks

Statistics

55 commits
~12,000 lines removed (CozoDB code)
72 files changed in final cleanup

Test Plan

cargo test -p db - 576 tests pass
cargo test -p code_search - 484 tests pass
cargo build - Builds without warnings
All acceptance tests pass

Known Issues

depends_on and function commands have empty CLI execute tests (only had CozoDB tests)
Database-layer tests for these commands remain intact

Create trait-based abstraction to support multiple database backends (CozoDB and SurrealDB). Defines core Database, QueryResult, Row, and Value traits with backend-agnostic parameter handling via QueryParams. Includes: - Database trait with execute_query methods - QueryResult, Row, Value traits for result handling - QueryParams struct with ValueType enum for parameters - Feature flags backend-cozo and backend-surrealdb - Factory functions with feature-gated compilation All traits are Send + Sync for thread safety. Factory implementations are stubbed with todo!() pending concrete backend implementations.

Wrap existing DbInstance functionality in the new trait interface to maintain compatibility while using the abstraction layer. Includes: - CozoDatabase wrapping DbInstance with open() and open_mem() - CozoQueryResult wrapping NamedRows - CozoRow wrapping row data - Value trait implementation for DataValue - Parameter conversion from QueryParams to CozoDB format - Factory functions open_database() and open_mem_database() - 7 comprehensive tests validating all trait implementations All existing CozoDB functionality works unchanged. This is a thin wrapper with no logic changes, focused on correct trait implementation.

Create stub implementation of the SurrealDB backend that compiles but returns unimplemented!() for all operations. This allows the project to compile with --features backend-surrealdb while the actual implementation is developed in Phase 2. Includes: - SurrealDatabase struct with open() and open_mem() methods - Database trait implementation (returns unimplemented!()) - Clear TODO comments guiding future implementation - Updated factory functions to call SurrealDB stub - Helpful error messages for users attempting to use the stub Compiles successfully with backend-surrealdb feature flag. The stub provides a foundation for future SurrealDB integration without blocking current development work.

Updated all CLI commands and test infrastructure to use the Database trait instead of concrete DbInstance type, completing the database abstraction layer implementation. Changes: - Update Execute and CommandRunner traits to accept &dyn Database - Migrate all 27 command implementations to use trait interface - Update test macros and fixtures to work with Box<dyn Database> - Refactor all 30 query modules to accept &dyn Database - Fix test utilities and database helpers for trait objects - Update row access patterns for trait object compatibility Impact: - CLI layer is now 100% backend-agnostic - All 593 tests passing (516 CLI + 77 DB) - Clean production build with no errors - 96 files changed (+1,139/-854 lines) The CLI can now support any database backend that implements the Database trait without requiring code changes.

Implemented compile-time backend selection through Cargo features, allowing users to choose between CozoDB and SurrealDB backends. Changes: - Make cozo, surrealdb, and tokio optional dependencies in db crate - Add feature propagation in CLI to control backend selection - Remove incorrect feature gates from backend-agnostic functions - Make CozoDB-specific exports conditional in lib.rs Backend selection: - default: Uses CozoDB (backward compatible) - --no-default-features --features backend-surrealdb: Uses SurrealDB - --no-default-features: Compile error (no backend selected) Impact: - All 596 tests passing (516 CLI + 77 DB + 3 doc) - CozoDB build: only cozo in dependency tree - SurrealDB build: only surrealdb in dependency tree - Smaller binaries (only selected backend compiled) - True multi-backend support enabled Verification: - cargo build (CozoDB): ✓ - cargo build --no-default-features --features backend-surrealdb: ✓ - cargo build --no-default-features: ✗ (expected compile error) - cargo test: ✓ (all tests pass)

Completely rewrote db/src/lib.rs to provide clear, well-documented public API exports that reflect the backend abstraction layer. Changes: - Add comprehensive module documentation with architecture overview - Add working usage example (tested as doc test) - Organize exports into 7 logical sections with inline docs - Add missing exports: escape_string, escape_string_single, ValueType - Deprecate DbInstance export (backward compatible) - Document all 35+ public exports Documentation improvements: - Backend selection guide with examples - Trait-based architecture explanation - Complete, runnable usage example - Inline docs for every export (shows in IDE autocomplete) - Organized into sections: Backend Abstraction, Database Operations, Value Extraction, Call Graph, Query Building, Domain Types, etc. Backward compatibility: - DbInstance still exported but deprecated - Compiler warns users to migrate to Box<dyn Database> - Old code continues to work Impact: - All 597 tests passing (516 CLI + 77 DB + 4 doc tests) - Documentation builds cleanly (cargo doc) - Professional-quality API surface - Better developer experience (clear docs, working examples) - No breaking changes Before: 45 lines, minimal docs, scattered exports After: 217 lines, comprehensive docs, organized exports

Refactor database schemas into backend-specific modules to support both CozoDB (relational) and SurrealDB (graph) architectures. Changes: - Create cozo_schema.rs with 7 CozoDB table definitions - Create surrealdb_schema.rs with 9 graph tables (5 nodes + 4 relationships) - Add conditional module exports in backend/mod.rs - Include helper functions for schema lookup and table lists - Add unit tests validating all schemas SurrealDB schema uses true graph model with TYPE RELATION for edges, SCHEMAFULL mode for strict validation, and UNIQUE indexes on natural keys. Note: schema.rs still contains old constants - will be removed in a later commit when create_schema() is updated with backend-conditional logic.

Completes the SurrealDatabase backend implementation that wraps the async SurrealDB API with a synchronous interface using tokio::Runtime. Key changes: - Add SurrealDatabase struct with Surreal<Db> and Runtime fields - Implement open() method for RocksDB persistence backend - Implement open_mem() method for in-memory testing backend - Add execute_query() with parameter conversion (Str, Int, Float, Bool) - Bridge async operations via runtime.block_on() - Enable kv-mem feature flag for in-memory backend support - Update conditional compilation for multi-backend support The execute_query() implementation returns unimplemented!() for result wrapping, which will be completed in a future commit.

This commit completes two related changes for multi-backend support: 1. SurrealDB type wrappers (db/src/backend/surrealdb.rs): - Add SurrealQueryResult struct implementing QueryResult trait - Add SurrealRow struct implementing Row trait - Implement Value trait for surrealdb::sql::Value - Complete execute_query() with object-to-row conversion - Extract headers from first result object - Extract values in consistent order based on headers - Handle empty results and missing fields gracefully - Support all 4 value types (str, i64, f64, bool) - Add test_value_extraction test 2. Backend-specific test scoping (db/src/queries/): - Scope hotspots, import, and search tests to CozoDB backend - Use #[cfg(all(test, feature = "backend-cozo"))] - Prevents failures when building with backend-surrealdb - Allows independent test suites per backend Both backends now compile and test independently.

Refactors schema creation to support backend-conditional logic with comprehensive test coverage for both CozoDB and SurrealDB backends. Schema Refactoring: - Refactor create_schema() to support backend-conditional logic - CozoDB: Single-pass creation of 7 relations (unchanged behavior) - SurrealDB: Two-phase creation (5 node tables, then 4 relationship tables) - Move schema constants from schema.rs to backend-specific modules - Update relation_names() to return correct list per backend - Relationship tables require node tables to exist first in SurrealDB Comprehensive Test Coverage: - Add 15 new tests for schema creation at db layer - CozoDB tests (6 tests): relation count, names, idempotency, DDL validity - SurrealDB tests (9 tests): table count, names, two-phase order, idempotency - Test coverage: ~90% for schema.rs on both backends - All new tests passing (85 CozoDB + 45 SurrealDB) Bug Fixes Discovered During Testing: - Fix SurrealDB execute_query() to handle DDL statements gracefully - DDL statements (DEFINE TABLE) return None instead of result rows - Add deserialization error detection and empty result fallback - Enhance try_create_relation() to detect SurrealDB "already exists" errors - Add "already exists" pattern matching for SurrealDB - Maintain existing CozoDB error detection Files Modified: - db/src/queries/schema.rs: Two-phase creation + 15 tests (~440 lines) - db/src/backend/surrealdb.rs: DDL handling fix (~10 lines) - db/src/db.rs: SurrealDB error detection (~5 lines)

Update try_create_relation() to properly detect backend-specific "already exists" errors, enabling idempotent schema creation for both CozoDB and SurrealDB backends. Changes: - Add feature-gated error detection (#[cfg(feature = "backend-X")]) - CozoDB: Detect "AlreadyExists" and "stored_relation_conflict" - SurrealDB: Detect "already exists" (matches Db::TbAlreadyExists) - Remove unused "already defined" pattern for SurrealDB - Add 4 comprehensive unit tests (>90% line coverage) - Improve function documentation with backend-specific patterns Tests verify: - Successful relation creation returns Ok(true) - Duplicate creation attempts return Ok(false) - Backend error patterns are correctly detected - Genuine errors are propagated Implements Ticket 05: SurrealDB error detection

Implement complete test coverage for the SurrealDB backend including unit tests, integration tests, and trait implementation verification. Unit Tests (18 tests in surrealdb.rs): - Database connection tests (open_mem, open, trait implementation) - Parameter conversion tests (Str, Int, Float, Bool, Multiple) - Value extraction tests (as_str, as_i64, as_f64, as_bool) - Query execution tests (DDL, multiple statements, parameterized) - Trait implementation tests (QueryResult, Row) Integration Tests (12 tests in backend_integration.rs): - Schema creation and validation (all 9 tables) - Two-phase creation order verification - Idempotent schema creation (multiple runs) - Database persistence and isolation tests - Query execution with parameters Test Results: - SurrealDB: 76 tests passing (60 unit + 12 integration + 4 doc) - CozoDB: 93 tests passing (no regressions) - Coverage: 74-95% across SurrealDB modules New Tests Added Beyond Ticket Requirements: - test_open_persistent_database: Tests SurrealDatabase::open() - test_query_result_trait: Tests QueryResult trait (headers, rows, into_rows) - test_row_trait: Tests Row trait (get, len, is_empty) All acceptance criteria met. Implements Ticket 06.

Added 189 tests across 27 query modules that previously had zero test coverage, bringing all 30 query modules under test. Tests cover happy paths, empty results, edge cases, error handling, and filter combinations. Coverage improved significantly: - Region coverage: 44% → 83% (+39pp) - Line coverage: 47% → 87% (+40pp) - Total tests: 89 → 278 (+189 tests) Modules with 100% coverage (4): - depends_on, depended_by, calls_from, calls_to Modules with >90% coverage (9): - reverse_trace, clusters, location, structs, specs, struct_usage, file, accepts, trace All tests use existing fixtures (call_graph_db, type_signatures_db, structs_db) and follow established patterns from hotspots, import, and search modules. This provides strong regression protection for the CozoDB backend and documents expected query behavior for future SurrealDB migration.

Complete TICKET_00: Test Infrastructure Setup Fixtures (db/src/test_utils.rs): - Add 9 low-level insert primitives for nodes and relationships - Add 3 fixture builders (call_graph, type_signatures, structs) - Add 10 comprehensive tests validating data integrity Critical Fix (db/src/backend/surrealdb.rs): - Fix execute_query response deserialization - Handle Array (SELECT), Object (INFO), and None (DDL) responses - Convert via JSON to work around private wrapper fields Feature Flags (db/src/backend/mod.rs, db/src/queries/schema.rs): - Make backend-cozo and backend-surrealdb mutually exclusive - Add compile error when both features enabled Test Results: - 10/10 SurrealDB fixture tests passing - Data can be created and retrieved successfully - All fixtures use direct inserts (no import dependency) Refs: TICKET_00, PHASE_2_PLAN.md section 7

Add SurrealDB implementation for search_modules() and search_functions() with 46 comprehensive test cases achieving 90.18% line coverage. Implementation: - Add search_modules() with SurrealQL regex and exact match support - Add search_functions() with SurrealQL regex and exact match support - Feature-gated behind #[cfg(feature = "backend-surrealdb")] - Handle SurrealDB quirks: alphabetical column ordering, ORDER BY ignored with regex - Use <regex> type casting for pattern matching (v3.0 syntax) Tests: - 46 test cases covering all functionality and edge cases - Strong assertions validating exact values against fixture data - Coverage: 90.18% lines, 94.44% functions, 94.72% regions - Test invalid regex, zero/large limits, empty patterns, sorting, etc. SurrealDB quirks discovered: - Returns columns in alphabetical order (not SELECT order) - Ignores ORDER BY when using regex WHERE clauses - Regex operator changed from ~ to <regex> type casting in v3.0 Workarounds applied: - Access columns by alphabetically-sorted field name positions - Sort results in Rust after query execution

…ests Migrated the find_functions() query from CozoScript to SurrealQL as part of TICKET_02. This migration follows the pattern established in TICKET_01 (search.rs) and achieves excellent coverage metrics. Implementation: - Added SurrealQL implementation of find_functions() behind feature flag - Supports module pattern filtering (exact or regex) - Supports function pattern filtering (exact or regex) - Supports optional arity filtering - Handles result limits correctly - Uses <regex> type casting for SurrealDB v3.0 compatibility Testing: - Added 28 comprehensive tests covering all code paths - Achieved 94.56% line coverage (exceeds 85% target) - Achieved 94.44% function coverage - All tests validate exact result values against known fixture data - Covers validation, basic functionality, limits, patterns, sorting, edge cases Tests organized by category: - Validation tests (4): regex validation, invalid patterns - Basic functionality (5): exact match, empty results - Limit tests (3): boundary conditions - Pattern matching (4): regex, alternation, character classes - Result structure (3): field validation - Sorting tests (2): ordering verification - Case sensitivity (2): case-sensitive matching - Edge cases (5): empty patterns, zero arity, field presence Changes: - db/src/queries/function.rs (+604 lines): SurrealQL implementation and tests

…ests Migrated the find_locations() query from CozoDB to SurrealQL as part of TICKET_03. This migration follows the pattern established in TICKET_01 and TICKET_02, achieving excellent coverage metrics. Implementation: - Added SurrealQL implementation of find_locations() behind feature flag - Supports module pattern filtering (optional, exact or regex) - Supports function pattern filtering (required, exact or regex) - Supports optional arity filtering - Handles result limits correctly - Uses <regex> type casting for SurrealDB v3.0 compatibility - Queries clause table (not function) to get line numbers Testing: - Added 29 comprehensive tests covering all code paths - Achieved 91.75% line coverage (exceeds 85% target) - Achieved 97.50% function coverage - All tests validate exact result values against known fixture data - Covers validation, functionality, patterns, sorting, edge cases Tests organized by category: - Validation tests (4): regex validation, error handling - Basic functionality (5): core query operations - Module pattern tests (2): filter by module - Limit tests (3): LIMIT clause handling - Pattern matching (4): regex patterns (dot-star, alternation, anchors, character classes) - Result structure (3): field validation, data integrity - Sorting tests (2): ordering verification - Case sensitivity (2): case-sensitive matching - Edge cases (4): empty patterns, parameter combinations, multiple clauses - Line preservation (1): line number accuracy Changes: - db/src/queries/location.rs (+606 lines): SurrealQL implementation and tests

Migrated the find_functions_in_module() query from CozoDB to SurrealQL as part of TICKET_04. This migration follows the pattern established in TICKET_01, TICKET_02, and TICKET_03, achieving excellent coverage. Implementation: - Added SurrealQL implementation of find_functions_in_module() behind feature flag - Supports module pattern filtering (exact or regex) - Queries clause table for file and line number information - Uses <regex> type casting for SurrealDB v3.0 compatibility - Implements client-side sorting to work around SurrealDB ORDER BY limitation - Handles result limits correctly Testing: - Added 15 comprehensive tests covering all code paths - Achieved 86.90% line coverage (exceeds 85% target) - Achieved 85.71% function coverage - Achieved 92.79% branch coverage - All tests validate exact result values against known fixture data - Covers validation, functionality, patterns, sorting, edge cases Tests organized by category: - Validation tests (2): regex validation, non-regex mode - Basic functionality (4): exact match, returns results, limit handling - Module filtering (2): specific module, nonexistent module - Pattern matching (3): regex patterns (dot-star, alternation, character classes) - Result structure (2): correct fields, field validation - Sorting tests (1): consistent ordering verification - Case sensitivity (1): case-sensitive matching - Edge cases (2): empty pattern exact mode, large limits Changes: - db/src/queries/file.rs (+481 lines): SurrealQL implementation and tests

Enhanced test infrastructure with surreal_call_graph_db_complex() function that mirrors the complexity of call_graph.json CozoDB fixture. This provides more realistic test data for comprehensive testing of query implementations. Complex fixture contains: - 5 modules: MyApp.Controller, MyApp.Accounts, MyApp.Service, MyApp.Repo, MyApp.Notifier - 15 functions with various arities (0-2) and visibility levels - Multiple clauses per function demonstrating pattern matching - 11 call edges forming a realistic multi-layer architecture - Realistic line numbers, complexity, and depth metrics Architecture modeled: - Controller layer (public API endpoints: index, show, create) - Business logic layer (Accounts, Service with validation and processing) - Data access layer (Repo with get, all, insert, query operations) - External services (Notifier with email functionality) Benefits: - Tests edge cases like multi-arity functions (get_user/1 and get_user/2) - Validates cross-module call chains (Controller -> Accounts -> Repo) - Tests both local and remote call types - Provides realistic data for trace, hotspot, and dependency analysis - Complements simple fixture for comprehensive test coverage Testing: - Added 6 comprehensive tests validating the complex fixture - Verifies exact counts: 5 modules, 15 functions, 11 calls - Tests multi-arity function handling - Validates realistic call chain queries - All tests passing (100%) This enhancement prepares the test infrastructure for more rigorous testing of remaining query modules (TICKET_05+).

Migrated the find_struct_fields() query from CozoDB to SurrealQL as part of TICKET_05. This migration handles the table rename (struct_fields → field) and achieves excellent coverage metrics. Implementation: - Added SurrealQL implementation of find_struct_fields() behind feature flag - Handles table rename from struct_fields to field - Supports module pattern filtering (exact or regex) - Special-cases empty patterns to match all records - Implements client-side sorting for regex queries - Uses <regex> type casting for SurrealDB v3.0 compatibility - Correctly maps SurrealDB's alphabetical column ordering - Maintains group_fields_into_structs() aggregation logic Testing: - Added 15 comprehensive tests covering all code paths - Achieved 93.56% line coverage (exceeds 85% target) - Achieved 90.00% function coverage - Achieved 95.42% branch coverage - All tests validate exact result values against known fixture data - Covers basic functionality, patterns, edge cases, and aggregation Tests organized by category: - Basic functionality (6): results retrieval, empty results, exact filtering, limits - Pattern matching (2): regex patterns, invalid regex rejection - Edge cases (3): empty patterns, result structure, field verification - Aggregation logic (4): field grouping, empty input, single fields, multiple projects Changes: - db/src/queries/structs.rs (+272 lines): SurrealQL implementation and tests

Migrated the find_types() query from CozoDB to SurrealQL as part of TICKET_06. Added new test fixture for type definitions and achieved excellent coverage metrics. Implementation: - Added SurrealQL implementation of find_types() behind feature flag - Supports module pattern filtering (exact or regex) - Supports type name filtering (exact or regex) - Supports optional kind filtering (struct, enum, etc.) - Handles empty module patterns correctly (converts to .* or 1=1) - Uses <regex> type casting for SurrealDB v3.0 compatibility - Implements client-side sorting for regex queries - Correctly maps SurrealDB's alphabetical column ordering Testing: - Added 22 comprehensive tests covering all code paths - Created new surreal_type_db() test fixture with 3 types - Achieved 92.13% line coverage (exceeds 85% target by 7.13%) - Achieved 96.43% function coverage - Achieved 91.15% region coverage - All tests validate exact result values against known fixture data Tests organized by category: - Validation tests (4): regex validation, invalid patterns, non-regex mode - Basic functionality (5): exact match, empty results, nonexistent modules - Kind filtering (3): kind filter, wrong kind, combined filters - Pattern matching (4): name patterns, module patterns, combined filters - Result structure (3): valid structure, field population, project field - Sorting tests (1): alphabetical ordering verification - Edge cases (2): empty patterns, non-existent projects Changes: - db/src/queries/types.rs (+530 lines): SurrealQL implementation and tests - db/src/test_utils.rs (+50 lines): New surreal_type_db() fixture

TICKET_07: Corrected implementation that was using inefficient client-side filtering due to incorrect assumption about SurrealDB WHERE clause limitations. Research findings: - SurrealDB DOES support in.field/out.field in WHERE clauses - Documentation confirms: "You can use graph relations in any CRUD operation either by using the arrow syntax <- -> or dot notation using in and out" - Example: DELETE order WHERE <-person.name ?= "Leoma Santiago" Changes: - Replaced client-side filtering (~80 lines) with proper WHERE clauses - Uses in.field/out.field dot notation for caller/callee filtering - Changed from LIMIT 10000 with Rust filtering to proper LIMIT $limit - Added ORDER BY for consistent results - Added 14 comprehensive tests (4 in calls.rs, 5 each in wrappers) Performance impact: - BEFORE: Fetched up to 10,000 records, filtered in Rust - AFTER: Only fetches matching records via database WHERE clause Test coverage: All 14 tests passing (100% for wrapper modules)

- Add as_array() and as_thing_id() methods to Value trait for extracting nested SurrealDB structures (Thing with compound array IDs) - Implement SurrealDB trace_calls() using graph path traversal syntax: {1..max_depth+path+inclusive}->calls->`function` - Add extract_function_ref() to deserialize Thing values into FunctionRef - Handle path deduplication for overlapping graph traversal results - Use string::matches() for regex pattern matching (not ~ operator) - Add test_trace_calls_broad_regex_many_paths test with actual regex patterns (MyApp\\..*) to verify string::matches() works correctly - Clean up Cozo-only imports behind feature flag All 17 SurrealDB trace tests pass.

…aversal Extends trace.rs to support both forward and reverse tracing via a TraceDirection enum, avoiding code duplication between trace and reverse_trace queries. Changes: - Add TraceDirection enum (Forward, Reverse) to trace.rs - Extend trace_calls() to generate ->calls-> or <-calls<- based on direction - Swap caller/callee extraction order for reverse direction - Create reverse_trace_calls() wrapper that calls trace_calls() with Reverse - Add 11 comprehensive SurrealDB tests for reverse trace - Fix compilation warnings in location.rs and test_utils.rs Test coverage: 11/11 tests passing for reverse_trace SurrealDB impl

Add find_paths() implementation using SurrealDB's native graph traversal with {..+shortest=} operator. Requires arity for both source and target to enable direct record ID construction. Changes: - Add SurrealDB find_paths() with shortest path algorithm - Add helper functions for path conversion from graph results - Add 7 comprehensive tests with strong value assertions - Fix cfg-gated imports per backend - Enhance complex fixture with alternate path for shortest path testing (Controller.create/2 -> Notifier.send_email/2 direct call) - Update trace tests for new fixture (12 calls instead of 11)

Add find_dependencies() implementation for both outgoing and incoming module dependencies. Queries the calls edge table directly and extracts function info from SurrealDB record references (Thing IDs). Changes: - Add SurrealDB find_dependencies() with direction-aware filtering - Add extract_function_ref_from_value() helper for parsing Thing IDs - Support both exact matching and regex patterns via string::matches() - Filter self-references at database level (in.module_name != out.module_name) - Add 16 comprehensive tests in dependencies.rs - Add 7 SurrealDB tests to depends_on.rs wrapper - Add 7 SurrealDB tests to depended_by.rs wrapper - Fix cfg-gated imports to avoid warnings in either backend

Fix SurrealDB-specific query syntax issues discovered during testing: - Use count() instead of COUNT(column) for row counting - GROUP BY with graph traversals returns single row; fixed by fetching call pairs and aggregating distinct modules in Rust Rewrite all SurrealDB hotspots tests with strong assertions against known fixture data (5 modules, 15 functions, 12 calls): - get_function_counts: verify exact counts per module - get_module_connectivity: verify exact incoming/outgoing per module - Cross-function consistency checks Test results: 331 SurrealDB tests passing, 280 CozoDB tests passing

Schema changes: - Rename file to source_file in clause table for consistency - Remove return_type from function table (unused in SurrealDB) - Remove inferred_type from field table (unused in SurrealDB) - Update table counts in schema tests (10 tables, 6 node tables) Test fixture changes: - Create comprehensive complex fixture with realistic web app structure - 5 modules: Controller, Accounts, Service, Repo, Notifier - 15 functions with various arities - 22 clauses with realistic line numbers - 12 call relationships forming a realistic call graph Test migrations: - Update all SurrealDB tests to use surreal_call_graph_db_complex() - Replace old simple fixture references (module_a, module_b, foo, bar) - Update assertions to match complex fixture's call graph structure - Fix trace/reverse_trace tests for recursive traversal behavior - All 331 SurrealDB tests now pass

Rename all non-relation tables to use plural form: - module -> modules - function -> functions - clause -> clauses - spec -> specs - type -> types - field -> fields Relationship tables remain unchanged (defines, has_clause, calls, has_field) as they represent actions rather than entities. Updated all queries, fixtures, and tests to use the new table names.

Add SurrealDB implementation of find_complexity_metrics with proper handling of SurrealDB-specific behaviors: - Use subquery + WHERE instead of HAVING (not supported in SurrealDB) - Avoid aliases on GROUP BY columns (breaks aggregation in SurrealDB) - Use math::sum(complexity) to aggregate clause complexity values - Extract columns in alphabetical order (BTreeMap behavior) Query aggregates clauses by function to calculate: - complexity: sum of complexity values across all clauses - max_nesting_depth: max across all clauses - line ranges: min(start_line), max(end_line) Includes 19 comprehensive tests covering: - Basic functionality and exact counts - Complexity and depth threshold filtering - Module pattern filtering (exact and regex) - Limit and ordering verification - Field validation and edge cases

Add feature-gated SurrealDB implementation for find_large_functions() that queries the clauses table to find functions by line count. Key implementation details: - Queries clauses table with correct column names (function_name, source_file) - Calculates lines as end_line - start_line + 1 - Supports module pattern filtering (exact and regex via string::matches) - Supports generated_by filtering to exclude generated functions - Handles SurrealDB alphabetical column ordering Includes 19 comprehensive tests covering: - Basic functionality and line calculations - Min lines threshold filtering - Module pattern matching (exact and regex) - Generated function filtering - Limit and ordering validation - Data integrity checks Test coverage: 88.13% (exceeds 85% target)

Add feature-gated SurrealDB implementation for find_many_clauses() that aggregates clauses to count functions with multiple clause heads. Key implementation details: - Queries clauses table with GROUP BY on (module_name, function_name, arity) - Uses count() for clause count, math::min/max for line ranges - Subquery pattern with WHERE filter on aggregated clause count - Supports module pattern filtering (exact and regex via string::matches) - Supports generated_by filtering to exclude generated functions - Handles SurrealDB alphabetical column ordering Includes 21 comprehensive tests covering: - Basic functionality and clause counting - Min clauses threshold filtering - Module pattern matching (exact and regex) - Generated function filtering - Limit and ordering validation - Data integrity and line range checks Test coverage: 90.75% (exceeds 85% target)

Add feature-gated SurrealDB implementation for get_module_calls() that traverses the calls relation to find inter-module calls. Key implementation details: - Uses relation traversal via in.module_name and out.module_name - calls table is RELATION FROM functions TO functions - Filters self-calls with WHERE in.module_name != out.module_name - Handles SurrealDB alphabetical column ordering Includes 14 comprehensive tests covering: - Basic functionality and module validation - Exact count verification (8 inter-module calls) - Self-call filtering confirmation - Module presence verification - Specific call path testing between modules Test coverage: 89.94% line, 100% function

Add SurrealDB implementation of find_unused_functions() with: - NOT IN subquery to find functions not called (id NOT IN calls.out) - Graph traversal via has_clause relation for kind/file/line - array::first() for kind and file, math::min() for earliest line - Kind filtering in WHERE clause (defp/defmacrop for private) - string::matches() for regex module pattern filtering - exclude_generated filtering in Rust using GENERATED_PATTERNS Add __struct__/0 to complex fixture for testing exclude_generated. Rewrite all 33 SurrealDB tests with strong assertions: - Exact counts (7 unused, 2 private, 5 public, 6 non-generated) - Specific function names, modules, arities, kinds, files, lines - Per-module assertions (Controller: 3, Accounts: 2, Repo: 1, etc.) - Combined filter tests (private+exclude, public+exclude, module+kind)

Enhance the complex SurrealDB test fixture with 3 call cycles for testing cycle detection functionality: - Cycle A (3 nodes): Service → Logger → Repo → Service - Cycle B (4 nodes): Controller → Events → Cache → Accounts → Controller - Cycle C (5 nodes): Notifier → Metrics → Logger → Events → Cache → Notifier Fixture now contains: - 9 modules (added Logger, Events, Cache, Metrics) - 31 functions (added 16 new functions) - 38 clauses (added 16 new clauses) - 24 call edges (12 original + 12 cycle edges) Updated test expectations across all query modules to reflect the expanded fixture data, including function counts, module counts, dependency counts, and trace traversal results.

Add SurrealDB implementation for find_cycle_edges() that detects circular dependencies between modules in the call graph. Implementation details: - Query module-to-module dependencies from calls edge table - Use in/out fields correctly (in=caller, out=callee in RELATE syntax) - Compute reachability to find edges that form part of cycles - An edge A→B is a cycle edge if B can reach A (completing the cycle) - Deduplicate and sort results for consistent output Tests validate against fixture data with strong assertions: - Exact count: 17 unique module-level cycle edges - All 17 specific edges verified by (from, to) pairs - Cycle A, B, C edges individually verified - Module pattern filtering with exact counts - 9 modules in cycles verified by name

Add find_duplicates() implementation for SurrealDB that finds functions with identical implementations via matching hash values (ast_sha or source_sha). Key implementation details: - Fixed SurrealDB column order issue (columns returned alphabetically by header name, not in SELECT order) by using headers().position() - Fixed ORDER BY to use column aliases for correct sorting - Module filtering applied after finding duplicates to preserve pairs Fixture updates for duplicate testing: - Added 6 new test functions across Accounts, Controller, Service, Repo - Added __generated__ to GENERATED_PATTERNS for exclude_generated filter - Total functions: 31 → 37, Total clauses: 38 → 44 Updated test expectations in 10 files to reflect new fixture data.

Add find_accepts() implementation for SurrealDB that searches specs table for functions accepting specified type patterns. Key changes: - Update insert_spec helper to accept input_strings and return_strings arrays - Create surreal_accepts_db() fixture with 9 specs across 3 modules - Implement array-based type matching using array::join() and string::contains() - Support regex pattern matching with /pattern/ literals - Add module filtering support - 12 tests with strong assertions (92.13% coverage) Data structure difference handled: CozoDB uses comma-joined string, SurrealDB uses native array<string> which is joined for output.

Add find_returns() implementation for SurrealDB that searches specs table for functions returning specified type patterns. Key changes: - Query return_strings array field using array::join() for substring matching - Use array::filter() with regex for pattern-based filtering - Support module filtering and limit enforcement - 15 tests with strong assertions (90.67% coverage) Follows same pattern as accepts.rs - both query specs table with array-based type fields converted to joined strings for output.

@callback

Add find_specs() implementation for SurrealDB with proper handling of array-based type storage. SurrealDB stores input_strings and return_strings as arrays which are joined on retrieval to match CozoDB format. - Add SurrealDB find_specs() with module, function, and kind filtering - Handle SurrealDB's alphabetical column ordering in result parsing - Add surreal_specs_db() fixture with 12 specs (9 @SPEC + 3 @callback) - 18 tests with strong assertions validating against fixture data

Add find_struct_usage() implementation for SurrealDB. This query finds specs where a type pattern appears in EITHER inputs OR returns (OR logic). - Query specs table with array::filter() for pattern matching - Support both substring and regex matching modes - Handle SurrealDB's alphabetical column ordering - 15 tests with strong assertions (89% coverage)

Add complete SurrealDB implementation for JSON import functionality including all node types, relationships, and array field support. Changes: - Add SurrealDB implementations for all import functions: - import_modules, import_functions, import_function_locations - import_specs (with native array support), import_types, import_structs - import_calls (with caller_clause_id lookup via line ranges) - clear_project_data - Add relationship creation functions: - create_defines_relationships (modules → functions/types/specs) - create_has_clause_relationships (functions → clauses) - create_has_field_relationships (modules → fields) - Add parse_function_ref helper for "name/arity" fixture format - Add StrArray variant to ValueType for native array parameter support - Add with_str_array method to QueryParams - Fix feature-gated imports across calls.rs, types.rs, import.rs - Fix misc warnings (unused mut, doc comments, unused imports) 12 new tests for SurrealDB import functionality.

Add find_hotspots function for SurrealDB backend with: - Incoming/outgoing call count aggregation - Module pattern filtering (exact and regex) - All HotspotKind sorting variants (Incoming, Outgoing, Total, Ratio) - Leaf node exclusion via require_outgoing flag Includes 15 comprehensive tests with assertions against fixture data.

Align find_paths function signature between CozoDB and SurrealDB backends by making from_arity and to_arity required parameters (i64 instead of Option<i64>). Changes: - Update CozoDB find_paths to require i64 arity parameters - Update CLI to require --from-arity and --to-arity arguments - Update all tests to pass concrete arity values

Extract SurrealDB trace logic to trace_calls_impl(direction) and create a public trace_calls wrapper that matches CozoDB's 8-parameter signature. reverse_trace_calls now calls trace_calls_impl with Reverse direction. This completes the SurrealDB backend alignment - both backends now compile successfully.

Import functions from function_locations instead of specs. The graph traversal queries for trace/reverse_trace/path require function nodes to have module_name, name, and arity fields set, which are only available when importing from function_locations. Changes: - Fix import_functions_surrealdb to import from function_locations - Fix get_function_counts to use subquery with arity grouping - Fix get_module_loc to calculate actual LOC from clause line ranges - Add SurrealDB execute tests for god_modules, location, depended_by, unused, struct_usage commands - Update type_signatures fixture to include function_locations - Update import test to use function_locations All 1,057 SurrealDB tests now pass.

Set up end-to-end acceptance tests for the CLI that exercise the full workflow: setup -> import -> query commands. Tests run against a temporary database using the actual binary. Changes: - Add assert_cmd and predicates dev-dependencies - Create TestProject harness with temp dir and db management - Add 18 acceptance tests covering setup, import, search, location, calls-from, calls-to, hotspots, unused, depends-on, depended-by, browse-module, trace, and JSON output format Run with: cargo test --features backend-surrealdb --no-default-features --test acceptance

The previous implementation had two bugs: 1. Used wrong in/out semantics (in=caller, out=callee in graph edges) 2. GROUP BY with field traversal (out.module_name) returns only 1 row due to SurrealDB bug: surrealdb/surrealdb#2695 Fix by using GROUP BY on the whole record (out/in), then extracting module and function from the Thing ID array [module, name, arity]. This delegates aggregation to the database instead of fetching all calls and counting in Rust. Also adds test_find_hotspots_verifies_fixture_values to assert actual counts from fixture data, catching regressions where all values are 0.

- Replace manual string escaping with parameterized queries ($param) - Use string::matches() for regex mode instead of <regex> type casting - Use type::string(column) = $param for exact matching (not substring) - Update tests to use regex mode or exact module names - Apply consistent formatting across query modules Files: specs.rs, file.rs, structs.rs, types.rs

Optimize the unused query and other function lookups by denormalizing frequently-accessed data onto the functions table, eliminating expensive graph traversals and subqueries. Schema changes (functions table): - Add kind, file, start_line fields (from first clause) - Add incoming_call_count, outgoing_call_count fields (computed) - Add indexes: kind, module+kind, incoming, outgoing, module+incoming, module+outgoing Import changes: - import_functions now populates kind, file, start_line from locations - Add update_call_counts() to compute call counts after importing calls - import_graph calls update_call_counts after all calls are imported Query optimization (unused.rs): - Use incoming_call_count = 0 instead of NOT IN subquery - Use denormalized kind/file/start_line instead of graph traversals Test coverage: - test_import_graph_updates_call_counts: integration test for import flow - test_update_call_counts_sets_incoming_counts: basic functionality - test_update_call_counts_multiple_calls: multiple callers/callees - test_update_call_counts_no_calls: empty calls table edge case

The query was only selecting a subset of fields from the clauses table, leaving kind, start_line, end_line, pattern, and guard as empty/zero. Changes: - Select all clause fields: kind, start_line, end_line, pattern, guard - Sort by start_line instead of line for proper function ordering - Update tests to verify fields are populated correctly This fixes the browse-module command to display complete function information including start/end lines and pattern matching details.

The calls-from command was showing (0:0) for all function start/end lines because the SurrealDB query was hardcoding zeros instead of using the caller_clause_id record reference. Changes: - Update calls query to traverse caller_clause_id for start_line/end_line - Fix import order: clauses must be imported before calls so the caller_clause_id lookup can find matching clauses - Build FunctionRef with definition info when location data is available - Enhance test to verify caller_clause_id traversal returns correct values Note: SurrealDB requires aliases when selecting multiple fields from a record reference, otherwise it collapses them into a single object.

- Add Value::get() trait method for extracting fields from objects - Implement get() for SurrealDB Object values - Refactor trace query to use functions.* to get full function records - Add edge lookup query to fetch call line and clause info from calls table - Sort trace output children by line number for better readability The trace query now correctly fetches the line number where each call occurs, and the output is sorted by line number at each depth level.

SurrealDB returns query result columns in alphabetical order by header name, not in SELECT clause order. This caused incorrect value extraction when code assumed positional indexing (e.g., SELECT line, file returns columns as [file, line] alphabetically). Changes: - Add lookup_call_edge function to path.rs for fetching call line/file - Use header-based column access: find indices by name before accessing - Apply same fix to trace.rs edge lookup code - Add tests verifying correct line number extraction

Add conditional DB_FILENAME constant that selects the appropriate database filename based on the backend feature flag: - backend-surrealdb: surrealdb.rocksdb - backend-cozo (default): cozo.sqlite This makes the default database path more intuitive for each backend.

Complete removal of CozoDB as a database backend option, leaving SurrealDB as the sole backend. This simplifies the codebase significantly by removing ~12,000 lines of duplicate code and conditional compilation. Changes: - Delete db/src/backend/cozo.rs and cozo_schema.rs (473 lines) - Remove CozoDB dependencies from Cargo.toml files - Remove all #[cfg(feature = "backend-cozo")] conditional blocks - Remove CozoDB test modules from 31 query files - Rename tests_surrealdb modules to tests - Update default database filename to surrealdb.rocksdb - Remove execute_empty_db_test! macro (CozoDB-specific) - Update all documentation references Note: depends_on and function commands now have empty CLI execute tests (they only had CozoDB tests). Database-layer tests remain intact. Test results: 1060 tests passing (576 db + 484 cli)

CamonZ added 30 commits December 23, 2025 22:46

CamonZ added 28 commits December 27, 2025 06:43

Remove unnecessary documentation files

52d63f4

Remove debug print statement

d15209f

CamonZ changed the title ~~Implement SurrealDB as alternative database backend~~ Migrate from CozoDB to SurrealDB Dec 30, 2025

CamonZ merged commit cf8c850 into master Dec 30, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Migrate from CozoDB to SurrealDB#12

Migrate from CozoDB to SurrealDB#12
CamonZ merged 58 commits intomasterfrom
refactor-generic-db-layer

CamonZ commented Dec 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

CamonZ commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Database Backend

Architecture Simplification

Query Modules

Testing

Documentation Updated

Statistics

Test Plan

Known Issues

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CamonZ commented Dec 27, 2025 •

edited

Loading