Skip to content

Conversation

@sanikolaev
Copy link
Collaborator

@sanikolaev sanikolaev commented Dec 12, 2025

Custom API URL Support for Embeddings

Related PR in the daemon manticoresoftware/manticoresearch#4034

Related issues:

PR Overview

Support for custom API URLs and configurable API timeouts has been implemented and tested in the embeddings system. Users can now specify custom endpoints for OpenAI, Voyage, and Jina API requests while maintaining backward compatibility with default URLs. Additionally, API timeouts are now configurable (previously hardcoded to 10 seconds and couldn't be adjusted, leading to timeouts in some cases). The system now validates API keys by making actual API requests instead of format-based checks (prefix matching), providing authoritative validation that works with any key format and custom endpoints. Error handling has been improved to include HTTP status codes in error messages (e.g., "HTTP error from remote model: status code 403") for better debugging and troubleshooting. The embeddings library version string format has been enhanced to include git commit hash and timestamp (matching the format used by other Manticore components), providing better traceability in searchd -v output. The SHOW CREATE TABLE output has also been fixed to properly display embedding configuration parameters (api_url, from, and api_timeout). Note: api_key is intentionally excluded from SHOW CREATE TABLE output for security reasons.

Problem Statement

Previously, the embeddings system had several limitations:

  1. Hardcoded API URLs: The system used fixed API endpoints for remote models:

    • OpenAI: https://api.openai.com/v1/embeddings
    • Voyage: https://api.voyageai.com/v1/embeddings
    • Jina: https://api.jina.ai/v1/embeddings

    This prevented users from using custom API endpoints, such as:

    • Self-hosted OpenAI-compatible APIs
    • Proxy servers
    • Alternative API gateways
    • Testing/staging environments
  2. Fixed API Timeout: The HTTP timeout was hardcoded to 10 seconds and couldn't be configured. This caused issues in some scenarios:

    • Slow network connections would timeout even with valid API keys
    • Users couldn't adjust timeout for unreliable networks
    • Fast local servers couldn't use shorter timeouts for faster failure detection
  3. Heuristic-Based API Key Validation: API keys were validated using format-based checks (prefix matching):

    • OpenAI: Required "sk-" prefix
    • Voyage: Required "pa-" prefix
    • Jina: Required "jina_" prefix

    This approach had several limitations:

    • Rejected valid keys that didn't match the expected format (e.g., custom API endpoints with different key formats)
    • Couldn't detect expired or revoked keys
    • Didn't validate actual API connectivity
    • Failed for self-hosted or proxy servers that might use different key formats
  4. Incomplete SHOW CREATE TABLE Output: The SHOW CREATE TABLE statement didn't display important embedding configuration parameters (api_url, from, api_timeout), making it difficult to see the complete table configuration and reproduce table definitions. Note: api_key is intentionally excluded from the output for security reasons.

Implementation

Architecture Changes

The implementation follows the existing data flow:

C++ ModelSettings_t → Rust FFI load_model() → Rust ModelOptions → Model Constructors → Model.predict()

Changes Made

1. C++ Interface (knn/knn.h)

  • Added m_sAPIUrl: std::string field to ModelSettings_t struct
  • This field is optional and defaults to empty string
  • Added m_iAPITimeout: int field to ModelSettings_t struct
  • This field defaults to 0 (which means use default timeout of 10 seconds)
  • Positive values specify timeout in seconds

2. C++ Implementation (knn/embeddings.cpp)

  • Updated ToKey() function to include API URL and timeout in cache key (ensures different URLs/timeouts create separate cached models)
    • Uses reserve() and append operations for efficiency when building the key from multiple components
  • Updated load_model() call to pass API URL and timeout parameters to Rust FFI
  • Updated SUPPORTED_EMBEDDINGS_LIB_VER from 2 to 3 (ABI-breaking change requires version bump)
  • API Key Validation: Added call to validate_api_key() after model creation in TextToEmbeddings_c::Initialize()
    • Makes a real API request to validate the key before allowing table creation/alter
    • Returns error if validation fails, preventing invalid configurations
    • Validation uses minimal test request ("test" string) to reduce API costs

3. Rust FFI Layer (embeddings/src/model/text_model_wrapper.rs)

  • Extended load_model() function signature to accept api_url_ptr, api_url_len, and api_timeout parameters
  • Extracts API URL string from C pointer and converts to Rust Option<String>
  • Extracts API timeout from C integer (0 means use default, positive value is timeout in seconds)
  • Passes API URL and timeout to ModelOptions struct
  • Added validate_api_key() FFI function that calls the model's validate_api_key() method
  • Added free_string() FFI function to free error strings returned by validation

4. Rust Model Options (embeddings/src/model/mod.rs)

  • Added api_url: Option<String> field to ModelOptions struct
  • Added api_timeout: Option<u64> field to ModelOptions struct (None means use default 10 seconds)
  • Updated create_model() function to pass api_url and api_timeout to all remote model constructors (OpenAI, Voyage, Jina)
  • Added validate_api_key() method to TextModel trait
  • Implemented validate_api_key() for Model enum to delegate to specific model implementations

5. Model Implementations

OpenAI Model (embeddings/src/model/openai.rs):

  • Added api_url: Option<String> field to OpenAIModel struct
  • Updated new() constructor to accept optional api_url and api_timeout parameters
  • Modified predict() method to use custom URL if provided, otherwise defaults to https://api.openai.com/v1/embeddings
  • API Key Validation: Removed format-based validation (sk- prefix check). Now uses real API validation via validate_api_key() method that makes an actual API request
  • HTTP client timeout is configurable via api_timeout parameter (defaults to 10 seconds if not specified)
  • validate_api_key() method makes a minimal test request with "test" string to validate the key
  • HTTP Status Code Handling: Improved error handling to check HTTP status codes before parsing JSON responses. Returns RemoteHttpError { status } with the actual HTTP status code (e.g., 403, 404, 429) without interpretation, allowing users to determine the meaning based on the status code

Voyage Model (embeddings/src/model/voyage.rs):

  • Added api_url: Option<String> field to VoyageModel struct
  • Updated new() constructor to accept optional api_url and api_timeout parameters
  • Modified predict() method to use custom URL if provided, otherwise defaults to https://api.voyageai.com/v1/embeddings
  • API Key Validation: Removed format-based validation (pa- prefix check). Now uses real API validation via validate_api_key() method
  • HTTP client timeout is configurable via api_timeout parameter (defaults to 10 seconds if not specified)
  • validate_api_key() method makes a minimal test request with "test" string to validate the key
  • HTTP Status Code Handling: Improved error handling to check HTTP status codes before parsing JSON responses. Returns RemoteHttpError { status } with the actual HTTP status code (e.g., 403, 404, 429) without interpretation, allowing users to determine the meaning based on the status code

Jina Model (embeddings/src/model/jina.rs):

  • Added api_url: Option<String> field to JinaModel struct
  • Updated new() constructor to accept optional api_url and api_timeout parameters
  • Modified predict() method to use custom URL if provided, otherwise defaults to https://api.jina.ai/v1/embeddings
  • API Key Validation: Removed format-based validation (jina_ prefix check). Now uses real API validation via validate_api_key() method
  • HTTP client timeout is configurable via api_timeout parameter (defaults to 10 seconds if not specified)
  • validate_api_key() method makes a minimal test request with "test" string to validate the key
  • HTTP Status Code Handling: Improved error handling to check HTTP status codes before parsing JSON responses. Returns RemoteHttpError { status } with the actual HTTP status code (e.g., 403, 404, 429) without interpretation, allowing users to determine the meaning based on the status code

Local Model (embeddings/src/model/local.rs):

  • Added validate_api_key() method that returns Ok(()) immediately (no API key required for local models)

6. FFI & Versioning (embeddings/src/ffi.rs)

  • Updated LoadModelFn type signature to include two additional parameters for API URL
  • Bumped library version from 2 to 3 (required for ABI compatibility)
  • Updated version string from "1.0.1" to "1.1.0"
  • Enhanced version string to include git commit hash and timestamp (matches format used by other Manticore libraries)
    • Format: "VERSION commit@timestamp" (e.g., "1.1.0 38f499e@25112313")
    • Version string is generated at compile time by build.rs and embedded in the library
  • Added ValidateApiKeyFn and FreeStringFn function pointer types to EmbedLib struct
  • Exposed validate_api_key() and free_string() functions through FFI for C++ integration

7. Build System & Version Generation (embeddings/build.rs and cmake/build_embeddings.cmake)

  • Version String Generation: Enhanced build.rs to generate version strings with git commit and timestamp
    • Extracts git commit hash (short format) from GIT_COMMIT_ID environment variable or git command
    • Extracts git commit timestamp (YYMMDDHH format) from GIT_TIMESTAMP_ID environment variable or git command
    • Formats version string as "VERSION commit@timestamp" to match other Manticore libraries
    • Passes version string to Rust compiler via cargo:rustc-env for compile-time embedding
  • CMake Integration: Updated build_embeddings.cmake to pass git commit and timestamp as environment variables
    • Sets GIT_COMMIT_ID and GIT_TIMESTAMP_ID environment variables before cargo build
    • These variables are populated by cmake/rev.cmake (included in main CMakeLists.txt)
    • Ensures consistent version information across all Manticore components

8. Generated C Header (embeddings/manticoresearch_text_embeddings.h)

  • Automatically regenerated via build.rs during cargo build
  • New function signature includes API URL parameters

9. Manticore Search SQL Interface (manticore_github/src/)

  • SQL Parser (ddl.l): Added TOK_API_URL and TOK_API_TIMEOUT token recognition
  • SQL Parser (ddl.y):
    • Added parsing rule for API_URL='<URL>' parameter
    • Added parsing rule for API_TIMEOUT='<seconds>' parameter
    • Added ALTER TABLE ... MODIFY COLUMN ... API_TIMEOUT='<seconds>' support
  • DDL Parser (searchdddl.cpp):
    • Added m_sAPIUrl field to ItemOptions_t struct
    • Added m_iAPITimeout field to ItemOptions_t struct (defaults to 0 = use default)
    • Added AddItemOptionAPIUrl() method to parse API_URL from SQL
    • Added AddItemOptionAPITimeout() method to parse API_TIMEOUT from SQL (validates positive integer)
    • Updated ToKNNModel() to map m_sAPIUrl and m_iAPITimeout to ModelSettings_t
  • Schema Handling (schema/schema.cpp):
    • Fixed assignment from NamedKNNSettings_t to ModelSettings_t using explicit static_cast to avoid slicing issues with multiple inheritance
    • The cast correctly extracts the ModelSettings_t subobject from NamedKNNSettings_t (which uses multiple inheritance: IndexSettings_t + ModelSettings_t)
  • Settings Conversion (indexsettings.cpp):
    • Uses direct casts to copy ModelSettings_t fields when converting from CreateTableAttr_t to NamedKNNSettings_t
  • Output Formatting (knnmisc.cpp):
    • Updated AddKNNSettings() to include from, api_url, and api_timeout (if non-default) in SHOW CREATE TABLE output
    • Note: api_key is intentionally excluded from SHOW CREATE TABLE output for security reasons
    • Updated JSON serialization/deserialization to include api_timeout
    • Updated ParseKNNConfigStr() to parse api_timeout from configuration strings
  • ALTER TABLE Support (sphinxrt.cpp, searchd.cpp, searchdsql.h):
    • Added STMT_ALTER_EMBEDDINGS_API_TIMEOUT statement type
    • Added Alter_e::ApiTimeout enum value
    • Implemented AlterApiTimeout() method in sphinxrt.cpp to modify timeout via ALTER TABLE
    • Added virtual method AlterApiTimeout() to sphinx.h base class
  • Documentation (manual/english/Searching/KNN.md and manual/english/Creating_a_table/Data_types.md):
    • Updated to document API_URL and API_TIMEOUT parameters for OpenAI, Voyage, and Jina models
    • Added usage examples with custom API URLs and timeouts

Usage Example

C++ API Usage

knn::ModelSettings_t settings;
settings.m_sModelName = "openai/text-embedding-3-small";
settings.m_sAPIKey = "sk-...";
settings.m_sAPIUrl = "https://custom-api.example.com/v1/embeddings";  // Optional
settings.m_sCachePath = "";
settings.m_bUseGPU = false;

std::string error;
auto* embeddings_lib = knn::LoadEmbeddingsLib(lib_path, error);
auto* text_to_embeddings = embeddings_lib->CreateTextToEmbeddings(settings, error);

If m_sAPIUrl is empty or not set, the system uses the default URLs for each provider.

SQL Usage

Users can now specify a custom API URL directly in SQL CREATE TABLE statements:

CREATE TABLE products_openai_custom (
    title TEXT,
    description TEXT, 
    embedding_vector FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='openai/text-embedding-ada-002' FROM='title,description' 
    API_KEY='sk-...' 
    API_URL='https://custom-api.example.com/v1/embeddings'
    API_TIMEOUT='30'
);

The API_URL and API_TIMEOUT parameters are optional:

  • If API_URL is not specified, the system uses the default URLs for each provider
  • If API_TIMEOUT is not specified or set to 0, the system uses the default timeout of 10 seconds

You can also modify the timeout via ALTER TABLE:

ALTER TABLE products_openai_custom MODIFY COLUMN embedding_vector API_TIMEOUT='60';

Real API Key Validation

Overview

Changed from format-based to real API validation: The system now validates API keys by making actual API requests instead of format-based checks (prefix matching). This fundamental change ensures that:

  • Invalid API keys are caught immediately at table creation/alter time
  • The validation is authoritative (comes from the actual API provider)
  • Works with any API key format (no assumptions about prefixes like "sk-", "pa-", "jina_")
  • Validates actual API connectivity and authentication
  • Supports custom API endpoints with any key format

Implementation

  1. Validation Timing: API key validation occurs:

    • During CREATE TABLE when embeddings model is specified
    • During ALTER TABLE ... MODIFY COLUMN ... API_KEY=...
    • During ALTER TABLE ... MODIFY COLUMN ... API_URL=...
  2. Validation Method:

    • Makes a minimal test API request with the string "test"
    • Uses the same HTTP client and endpoint as regular embedding requests
    • Timeout is configurable via API_TIMEOUT parameter (defaults to 10 seconds if not specified)
    • Returns clear error messages if validation fails
    • HTTP Status Code Handling: Error messages now include HTTP status codes for better debugging:
      • All non-200 HTTP status codes return RemoteHttpError { status } with the specific status code
      • Error message format: "HTTP error from remote model: status code "
      • The status code is returned to the user without interpretation, allowing them to determine the meaning
      • This provides clearer error messages than generic "Failed to parse response" errors
  3. Removed Format Validation:

    • Previously checked for format prefixes: "sk-" (OpenAI), "pa-" (Voyage), "jina_" (Jina)
    • Format validation has been completely removed
    • Only basic checks remain: non-empty, no leading/trailing whitespace
  4. Local Models:

    • Skip API validation (no API key required)
    • validate_api_key() returns Ok(()) immediately

Benefits

  • Authoritative: Validation comes from the actual API, not heuristics or format assumptions
  • Flexible: Works with any API key format (no format assumptions like "sk-" prefix)
  • Supports Custom Endpoints: Works with self-hosted APIs, proxy servers, and alternative gateways that may use different key formats
  • Detects Real Issues: Catches expired keys, revoked keys, and network connectivity problems
  • Early Detection: Invalid keys are caught at table creation, not during first INSERT
  • Clear Errors: API-provided error messages are more informative than format checks
  • HTTP Status Code Reporting: Error messages now include HTTP status codes (e.g., "HTTP error from remote model: status code 403") for better debugging and troubleshooting. The status code is returned without interpretation, allowing users to determine the meaning based on standard HTTP status code semantics

Considerations

  • Network Dependency: Table creation requires network connectivity to API endpoint
  • API Costs: Each validation makes one API request (minimal cost with "test" string)
  • Timeout Handling: 10-second timeout may need adjustment for slow networks
  • Offline Scenarios: Cannot create tables with remote models when offline (by design)

Backward Compatibility

  • Fully backward compatible: Empty/None API URL values use default URLs
  • Version protection: Version bump (2→3) ensures old C++ code won't accidentally use new library
  • No breaking changes: Existing code continues to work without modification
  • API Key Validation: Real API validation replaces format-based checks, providing more reliable validation

Testing

  • Test with custom URLs for each provider (OpenAI, Voyage, Jina) - IMPLEMENTED
  • Test with empty/default URLs (backward compatibility) - VERIFIED
  • Verify cache key includes URL (different URLs = different cache entries) - VERIFIED
  • Verify version compatibility check works (old library rejected, new library accepted) - VERIFIED
  • Test real API key validation at table creation - IMPLEMENTED
  • Test API key validation failure scenarios (invalid key, network error, timeout) - IMPLEMENTED
  • Test SHOW CREATE TABLE displays api_url parameter - IMPLEMENTED
  • Test with mock servers for local testing - IMPLEMENTED
  • Test that format-based validation (sk-, pa-, jina_ prefixes) has been removed - VERIFIED

Mock Server for Testing

A PHP mock server (test/clt-tests/mcl/mock-embeddings-server.php) has been created to enable testing of the API_URL parameter without making real API calls:

  • Purpose: Simulates OpenAI, Voyage, and Jina API endpoints locally
  • Functionality: Accepts POST requests to /v1/embeddings and returns deterministic random embeddings seeded by input text
  • Deterministic Embeddings: Same input text always produces the same embedding vector (seeded by SHA-256 hash of text)
  • Configurable Delay:
    • Supports --delay SECONDS command line parameter to set default delay
    • Supports delay=N substring in input text to override delay per request (e.g., "test delay=2.5")
    • Useful for testing timeout behavior without restarting the server
  • Provider Setting:
    • Supports --provider PROVIDER command line parameter to set default provider
    • Supports provider=NAME substring in input text to override provider per request
    • When provider is specified, validates that the model matches the provider (OpenAI models only with OpenAI provider, etc.)
  • Input Text Parameters: The delay=N, provider=..., and response-type=... substrings are automatically removed from input text before generating embeddings, ensuring they don't affect embedding values
  • Response Type Override: Supports response-type=TYPE substring in input text to override response format per request:
    • response-type=html - Returns HTML instead of JSON (for testing parsing errors)
    • response-type=invalid-json - Returns malformed JSON (for testing JSON parsing errors)
    • response-type=wrong-structure - Returns valid JSON but wrong structure (for testing response format validation)
    • response-type=default or not specified - Returns normal embeddings JSON
  • API Key Validation: Validates API keys from Authorization header - all keys are valid except "wrong" (returns 401 error)
  • Model/Provider Validation: When provider is specified, validates that the model matches the provider (OpenAI models only with OpenAI provider, etc.)
  • Help Page: Shows usage information when run with --help, -h, or without any parameters
  • Usage: Can run multiple instances on different ports for parallel testing
  • Response Format: Returns proper JSON with {"data": [{"embedding": [0.379464, -0.250134, ...]}]} structure
  • Realistic Values: Embeddings are in typical range (approximately [-0.5, 0.5]) for realistic testing

Example Usage:

# Start mock server for OpenAI testing (no delay)
php test/clt-tests/mcl/mock-embeddings-server.php --port 8080 --provider openai

# Start mock server with 2 second delay (for timeout testing)
php test/clt-tests/mcl/mock-embeddings-server.php --port 8080 --provider openai --delay 2.0

# Show help
php test/clt-tests/mcl/mock-embeddings-server.php --help

# Test API key validation (all keys valid except "wrong")
# Valid key: API_KEY='test-key' or any other key
# Invalid key: API_KEY='wrong' (returns 401 error)

# Then use in SQL:
CREATE TABLE test_custom (
    title TEXT,
    embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='openai/text-embedding-ada-002' 
    FROM='title' 
    API_KEY='test-key' 
    API_URL='http://localhost:8080/v1/embeddings'
    API_TIMEOUT='5'  -- Use shorter timeout to test delay behavior
);

# Insert with delay override in text (overrides command line delay)
INSERT INTO test_custom (id, title) VALUES (1, 'test delay=6.0');  -- Will delay 6 seconds

# Insert with provider override in text
INSERT INTO test_custom (id, title) VALUES (2, 'test provider=openai');  -- Will validate as OpenAI

Rust Unit Test Files Updated

The following Rust unit test files in embeddings/src/model/ have been updated to match the new implementation:

  • openai_test.rs:

    • Removed validate_api_key function import (now a trait method that makes real API requests)
    • Updated all OpenAIModel::new() calls to include api_url and api_timeout parameters
    • Removed obsolete API key format validation tests (format-based validation replaced with real API validation)
    • Added comment explaining that API key validation is now done via real API requests
  • voyage_test.rs:

    • Removed validate_api_key function import
    • Updated all VoyageModel::new() calls to include api_url and api_timeout parameters
    • Removed obsolete API key format validation tests
    • Fixed duplicate test attributes
  • jina_test.rs:

    • Removed validate_api_key function import
    • Updated all JinaModel::new() calls to include api_url and api_timeout parameters
    • Removed obsolete API key format validation tests
    • Restored #[test] attributes on test functions
  • ffi_test.rs:

    • Already had the correct load_model signature with new api_url and api_timeout parameters
  • local_test.rs:

    • No changes required (local models don't use API URLs or timeouts)

Test Results: All 132 unit tests pass successfully, validating the embeddings library functionality including the new API URL and timeout features.

Additional Implementation Notes

  • Real API Key Validation: API keys are now validated by making actual API requests instead of format checks. This provides authoritative validation and works with any key format.

  • SHOW CREATE TABLE Output: The SHOW CREATE TABLE statement now displays api_url, from, and api_timeout (if non-default) parameters in the output, allowing users to see the complete configuration. Note: api_key is intentionally excluded from the output for security reasons.

  • Cache Key: Different API_URL values create separate cached model instances, ensuring that models with different endpoints are kept separate.

  • Configurable API Timeout: The API_TIMEOUT parameter allows users to specify a custom HTTP timeout for API requests:

    • Default: 10 seconds (if not specified or set to 0)
    • Format: Positive integer representing timeout in seconds
    • Usage: Can be set during table creation or modified via ALTER TABLE ... MODIFY COLUMN ... API_TIMEOUT='<seconds>'
    • Scope: Applies to all HTTP requests (validation during table creation/alter, and embedding generation during INSERT)
    • Cache Key: Different timeout values create separate cached model instances (timeout is included in cache key)
    • Display: Timeout is shown in SHOW CREATE TABLE output only if set to a non-default value
  • Test Requests: Validation uses minimal test string ("test") to reduce API costs while ensuring the key works.

Version String Enhancement

Overview

The embeddings library version string has been enhanced to include git commit hash and timestamp, matching the format used by other Manticore components (columnar, secondary, knn). This provides better traceability and debugging capabilities.

Implementation Details

Version String Format: VERSION commit@timestamp

  • Example: 1.1.0 38f499e@25112313
  • VERSION: Semantic version from Cargo.toml (currently "1.1.0")
  • commit: Short git commit hash (7 characters)
  • timestamp: Git commit timestamp in YYMMDDHH format (8 digits)

Build Process:

  1. CMake (cmake/rev.cmake) extracts git commit and timestamp from the repository
  2. build_embeddings.cmake passes these values as environment variables (GIT_COMMIT_ID, GIT_TIMESTAMP_ID) to cargo build
  3. build.rs reads these environment variables (or falls back to git commands) and generates the version string
  4. The version string is embedded at compile time via cargo:rustc-env and read in ffi.rs using env!() macro
  5. The version string is exposed through the EmbedLib.version_str field, which is displayed by searchd -v

Display Output:
When running searchd -v, the embeddings library version is now displayed as:

Manticore 0.0.0 fed8cd101@25112702 (columnar 8.1.0 e1522a2@25100213) (secondary 8.1.0 e1522a2@25100213) (knn 0.0.0 38f499e@25112313) (embeddings 1.1.0 38f499e@25112313)

Backward Compatibility:

  • The version string format change is backward compatible
  • If git information is not available, fallback values are used ("unknown" for commit, "00000000" for timestamp)
  • Standalone cargo builds (without CMake) will still generate version strings using git commands directly

Files Modified

Columnar Repository (Embeddings Library)

  1. knn/knn.h - Added m_sAPIUrl and m_iAPITimeout fields to ModelSettings_t
  2. knn/embeddings.cpp - Updated cache key and FFI call, version bump, API key validation
  3. embeddings/src/model/text_model_wrapper.rs - Extended FFI signature
  4. embeddings/src/model/mod.rs - Added api_url field and propagation
  5. embeddings/src/model/openai.rs - Added URL support, relaxed API key validation for custom URLs
  6. embeddings/src/model/voyage.rs - Added URL support, relaxed API key validation for custom URLs
  7. embeddings/src/model/jina.rs - Added URL support, relaxed API key validation for custom URLs
  8. embeddings/src/model/openai_test.rs - Updated tests for new validate_api_key signature, added custom URL tests
  9. embeddings/src/model/voyage_test.rs - Updated tests for new validate_api_key signature, added custom URL tests
  10. embeddings/src/model/jina_test.rs - Updated tests for new validate_api_key signature, added custom URL tests
  11. embeddings/src/ffi.rs - Updated function signature, version bump, and version string generation
  12. embeddings/build.rs - Added git commit and timestamp extraction for version string generation
  13. embeddings/Cargo.toml - Updated version from "0.1.0" to "1.1.0"
  14. cmake/build_embeddings.cmake - Added environment variable passing for git commit and timestamp
  15. embeddings/manticoresearch_text_embeddings.h - Auto-regenerated header

Manticore Search Repository (SQL Interface)

  1. src/ddl.l - Added TOK_API_URL token
  2. src/ddl.y - Added API_URL parsing rule
  3. src/searchdddl.cpp - Added m_sAPIUrl field, AddItemOptionAPIUrl() method, and updated ToKNNModel()
  4. src/schema/schema.cpp - Fixed ModelSettings_t assignment with explicit static_cast to avoid slicing
  5. src/indexsettings.cpp - Uses direct casts to ensure proper ModelSettings_t copying without object slicing
  6. src/knnmisc.cpp - Updated AddKNNSettings() to display api_url in SHOW CREATE TABLE output
  7. src/schema/columninfo.h - NamedKNNSettings_t struct definition (uses multiple inheritance)
  8. manual/english/Searching/KNN.md - Updated documentation with API_URL examples
  9. manual/english/Creating_a_table/Data_types.md - Updated documentation with API_URL examples

Testing Infrastructure

  1. test/clt-tests/mcl/mock-embeddings-server.php - PHP mock server for testing API_URL functionality
    • Generates deterministic random embeddings seeded by input text
    • Supports configurable delay to simulate slow API responses
    • Returns embeddings with correct dimensions for all supported models
  2. test/clt-tests/mcl/auto-embeddings-openai-remote.rec - Added actual test cases for API_URL functionality
  3. test/clt-tests/mcl/auto-embeddings-voyage-remote.rec - Added actual test cases for API_URL functionality

@github-actions
Copy link

github-actions bot commented Dec 12, 2025

Linux debug test results

  8 files    8 suites   12m 52s ⏱️
501 tests 480 ✅ 21 💤 0 ❌
515 runs  494 ✅ 21 💤 0 ❌

Results for commit 977e9c0.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Dec 12, 2025

Windows test results

  5 files    5 suites   18m 0s ⏱️
482 tests 461 ✅ 14 💤 7 ❌
490 runs  469 ✅ 14 💤 7 ❌

For more details on these failures, see this check.

Results for commit 977e9c0.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Dec 12, 2025

Linux release test results

  8 files    8 suites   6m 59s ⏱️
501 tests 485 ✅ 14 💤 2 ❌
515 runs  499 ✅ 14 💤 2 ❌

For more details on these failures, see this check.

Results for commit 977e9c0.

♻️ This comment has been updated with latest results.

Related issues:
- manticoresoftware/manticoresearch#3771
- manticoresoftware/manticoresearch#3869

The EmbedLib struct layout changed which is a breaking change in the embeddings library.

Additional changes:
- Added git commit ID and timestamp to the build process for consistent versioning across libraries.
- Moved API key validation to CREATE TABLE.
- Improved error handling for remote model requests, including detailed HTTP error reporting.
- Updated tests to reflect changes in API key validation and model initialization.
validate_api_key(api_key).map_err(|_| LibError::RemoteInvalidAPIKey)?;
validate_model(&model).map_err(|_| LibError::RemoteUnsupportedModel { status: None })?;
// Only validate basic requirements (non-empty, no whitespace)
// Real validation happens via actual API request in validate_api_key()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the real validation appears in validate_api_key, what is the point of having this one that is not real? Creating multiple functions of validation for nothing is better to stick to one validation flow instead of overly complex multiple functions that split validation for no reason.

std::string m_sCachePath;
std::string m_sAPIKey;
std::string m_sAPIUrl;
int m_iAPITimeout = 0; // 0 means use default (10 seconds), positive value is timeout in seconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing. Why do we set 0 and say 0 means default 10 seconds? If we set the default to 10, we should set 10; 0 is the common pattern for no limit at all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the daemon's point of view, the default value is 0. It is set to 10 only in the Rust part. That said, it's fine to change it if you think it's better to control it here.


impl JinaModel {
pub fn new(model_id: &str, api_key: &str) -> Result<Self, Box<dyn std::error::Error>> {
pub fn new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that as for api_* related to all the same config structure, instead of adding multiple arguments, it is more convenient to make a structure like ApiConfig: Option<...> that will contain key, URL, timeout, and whatever else is required. This way, the method signature will have less complexity regarding input variables.

validate_model(&model).map_err(|_| LibError::RemoteUnsupportedModel { status: None })?;
// Only validate basic requirements (non-empty, no whitespace)
// Real validation happens via actual API request in validate_api_key()
validate_api_key_basic(api_key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still sounds weird. We should stick with simple validate_api_key and call it in one place, and not make two separate spaghetti methods that do the same thing but in different ways. Code should be clean.

free_vec_result: FreeVecResultFn,
get_hidden_size: GetLenFn,
get_max_input_size: GetLenFn,
validate_api_key: ValidateApiKeyFn,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we introduce the ETERNAL method into FFI for validating the API key, which will require an update to the C++ code to run validation when the API key is present and react to it. It is still acceptable, but it would be better to encapsulate it into model creation and do all validations there to make the client simpler.

@github-actions
Copy link

clt

❌ CLT tests in test/clt-tests/mcl/
✅ OK: 13
❌ Failed: 8
⏳ Duration: 489s
👉 Check Action Results for commit e5dff86

Failed tests:

🔧 Edit failed tests in UI:

test/clt-tests/mcl/auto-embeddings-backup-restore.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd $SEARCHD_FLAGS > /dev/null; if timeout 10 grep -qm1 '\[BUDDY\] started' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Buddy started!'; else echo 'Timeout or failed!'; cat /var/log/manticore/searchd.log;fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_backup (
    title TEXT,
    content TEXT,
    status INTEGER,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='title, content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_backup (id, title, content, status) VALUES
    (1, 'machine learning', 'neural networks', 1),
    (2, 'deep learning', 'transformers', 1),
    (3, 'computer vision', 'image processing', 2)"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_backup; OPTIMIZE TABLE test_backup OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_backup WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
+------+
| id   |
+------+
|    1 |
|    3 |
- |    2 |
+ +------+
- +------+
––– input –––
mysql -h0 -P9306 -E -e "SELECT id, title, content, KNN_DIST() as distance FROM test_backup WHERE KNN(vec, 3, 'artificial intelligence') ORDER BY distance"
––– output –––
OK
––– input –––
manticore-backup --version | grep -c "Manticore Backup"
––– output –––
OK
––– input –––
mkdir -p /tmp/backup && chmod 777 /tmp/backup; echo $?
––– output –––
OK
––– input –––
manticore-backup --backup-dir=/tmp/backup --tables=test_backup 2>&1 | grep -c "Backing up table"
––– output –––
OK
––– input –––
ls -d /tmp/backup/backup-* | wc -l
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FREEZE test_backup"
––– output –––
+-----------------------------------------------------+-----------------------------------------------------+
| file                                                | normalized                                          |
+-----------------------------------------------------+-----------------------------------------------------+
| /var/lib/manticore/test_backup/test_backup.0.spc    | /var/lib/manticore/test_backup/test_backup.0.spc    |
| /var/lib/manticore/test_backup/test_backup.0.spd    | /var/lib/manticore/test_backup/test_backup.0.spd    |
| /var/lib/manticore/test_backup/test_backup.0.spds   | /var/lib/manticore/test_backup/test_backup.0.spds   |
| /var/lib/manticore/test_backup/test_backup.0.spe    | /var/lib/manticore/test_backup/test_backup.0.spe    |
| /var/lib/manticore/test_backup/test_backup.0.sph    | /var/lib/manticore/test_backup/test_backup.0.sph    |
| /var/lib/manticore/test_backup/test_backup.0.sphi   | /var/lib/manticore/test_backup/test_backup.0.sphi   |
| /var/lib/manticore/test_backup/test_backup.0.spi    | /var/lib/manticore/test_backup/test_backup.0.spi    |
- | /var/lib/manticore/test_backup/test_backup.0.spidx  | /var/lib/manticore/test_backup/test_backup.0.spidx  |
+ | /var/lib/manticore/test_backup/test_backup.0.spknn  | /var/lib/manticore/test_backup/test_backup.0.spknn  |
- | /var/lib/manticore/test_backup/test_backup.0.spknn  | /var/lib/manticore/test_backup/test_backup.0.spknn  |
+ | /var/lib/manticore/test_backup/test_backup.0.spm    | /var/lib/manticore/test_backup/test_backup.0.spm    |
- | /var/lib/manticore/test_backup/test_backup.0.spm    | /var/lib/manticore/test_backup/test_backup.0.spm    |
+ | /var/lib/manticore/test_backup/test_backup.0.spp    | /var/lib/manticore/test_backup/test_backup.0.spp    |
- | /var/lib/manticore/test_backup/test_backup.0.spp    | /var/lib/manticore/test_backup/test_backup.0.spp    |
+ | /var/lib/manticore/test_backup/test_backup.0.spt    | /var/lib/manticore/test_backup/test_backup.0.spt    |
- | /var/lib/manticore/test_backup/test_backup.0.spt    | /var/lib/manticore/test_backup/test_backup.0.spt    |
+ | /var/lib/manticore/test_backup/test_backup.meta     | /var/lib/manticore/test_backup/test_backup.meta     |
- | /var/lib/manticore/test_backup/test_backup.meta     | /var/lib/manticore/test_backup/test_backup.meta     |
+ | /var/lib/manticore/test_backup/test_backup.settings | /var/lib/manticore/test_backup/test_backup.settings |
- | /var/lib/manticore/test_backup/test_backup.settings | /var/lib/manticore/test_backup/test_backup.settings |
+ +-----------------------------------------------------+-----------------------------------------------------+
- +-----------------------------------------------------+-----------------------------------------------------+
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_backup"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_backup (id, title, content, status) VALUES (4, 'frozen insert', 'test data', 3)"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "UNFREEZE test_backup"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_backup"
––– output –––
OK
––– input –––
mysqldump -h0 -P9306 manticore test_backup > /tmp/logical_backup.sql 2>/dev/null; echo $?
––– output –––
OK
––– input –––
grep -c "INSERT INTO" /tmp/logical_backup.sql
––– output –––
OK
––– input –––
searchd --stopwait > /dev/null 2>&1; echo $?
––– output –––
OK
––– input –––
rm -f /etc/manticoresearch/manticore.conf; rm -rf /var/lib/manticore/*; echo "Cleaned for restore"
––– output –––
OK
––– input –––
manticore-backup --backup-dir=/tmp/backup --restore 2>&1 | grep -c "backup-"
––– output –––
OK
––– input –––
BACKUP_NAME=$(manticore-backup --backup-dir=/tmp/backup --restore 2>&1 | grep backup- | awk '{print $1}' | head -1)
manticore-backup --backup-dir=/tmp/backup --restore=$BACKUP_NAME 2>&1 | grep -c "Starting to restore"
––– output –––
- 1
+ 0
––– input –––
searchd > /dev/null 2>&1; echo $?
––– output –––
- 0
+ 1
––– input –––
echo "Waiting for searchd to start"; sleep 3
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_backup"
––– output –––
- +----------+
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
- | count(*) |
- +----------+
- |        3 |
- +----------+
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_backup; OPTIMIZE TABLE test_backup OPTION sync=1, cutoff=1"; echo $?
––– output –––
- 0
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
+ 1
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_backup WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
- +------+
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
- | id   |
- +------+
- |    1 |
- |    3 |
- |    2 |
- +------+
––– input –––
mysql -h0 -P9306 -e "ALTER TABLE test_backup ADD COLUMN new_field INTEGER"; echo $?
––– output –––
- 0
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
+ 1
––– input –––
mysql -h0 -P9306 -e "DESC test_backup" | grep "new_field"
––– output –––
- | new_field | uint         | columnar fast_fetch     |
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_copy (
    title TEXT,
    content TEXT,
    status INTEGER,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='title, content'
) engine='columnar'"; echo $?
––– output –––
- 0
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
+ 1
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_copy (id, title, content, status) VALUES
    (1, 'machine learning', 'neural networks', 1),
    (2, 'deep learning', 'transformers', 1),
    (3, 'computer vision', 'image processing', 2),
    (4, 'frozen insert', 'test data', 3)"; echo $?
––– output –––
- 0
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
+ 1
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_copy"
––– output –––
- +----------+
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
- | count(*) |
- +----------+
- |        4 |
- +----------+
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_copy WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
- +------+
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
- | id   |
- +------+
- |    1 |
- |    3 |
- |    2 |
- |    4 |
- +------+
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_copy; OPTIMIZE TABLE test_copy OPTION sync=1, cutoff=1"; echo $?
––– output –––
- 0
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
+ 1
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_copy WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
- +------+
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
- | id   |
- +------+
- |    1 |
- |    3 |
- |    2 |
- |    4 |
- +------+
test/clt-tests/mcl/auto-embeddings-hnsw-configs.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd $SEARCHD_FLAGS > /dev/null; if timeout 10 grep -qm1 '\[BUDDY\] started' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Buddy started!'; else echo 'Timeout or failed!'; cat /var/log/manticore/searchd.log;fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_l2 (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_cosine (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='cosine'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_ip (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='ip'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
for table in test_l2 test_cosine test_ip; do
    mysql -h0 -P9306 -e "INSERT INTO $table (id, content) VALUES
        (1, 'machine learning'),
        (2, 'deep learning'),
        (3, 'cooking recipes')" 2>/dev/null
done
echo "Data inserted into all tables"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_l2; OPTIMIZE TABLE test_l2 OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_cosine; OPTIMIZE TABLE test_cosine OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_ip; OPTIMIZE TABLE test_ip OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SELECT id FROM test_l2 WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
*************************** 1. row ***************************
id: 1
*************************** 2. row ***************************
id: 2
- *************************** 3. row ***************************
- id: 3
––– input –––
mysql -h0 -P9306 -E -e "SELECT id FROM test_cosine WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
*************************** 1. row ***************************
id: 1
*************************** 2. row ***************************
id: 2
- *************************** 3. row ***************************
- id: 3
––– input –––
mysql -h0 -P9306 -E -e "SELECT id FROM test_ip WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
*************************** 1. row ***************************
id: 1
*************************** 2. row ***************************
id: 2
- *************************** 3. row ***************************
- id: 3
––– input –––
echo "L2 (Euclidean) distances:"
mysql -h0 -P9306 -E -e "SELECT id, content, KNN_DIST() as distance FROM test_l2 WHERE KNN(vec, 3, 'neural networks') ORDER BY distance"
––– output –––
OK
––– input –––
echo "Cosine similarity distances (smaller = more similar):"
mysql -h0 -P9306 -E -e "SELECT id, content, KNN_DIST() as distance FROM test_cosine WHERE KNN(vec, 3, 'neural networks') ORDER BY distance"
––– output –––
OK
––– input –––
echo "Inner product distances (smaller = more similar):"
mysql -h0 -P9306 -E -e "SELECT id, content, KNN_DIST() as distance FROM test_ip WHERE KNN(vec, 3, 'neural networks') ORDER BY distance"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_hnsw_m4 (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    HNSW_M='4'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_hnsw_m32 (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    HNSW_M='32'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_hnsw_ef (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    HNSW_M='16'
    HNSW_EF_CONSTRUCTION='500'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
for table in test_hnsw_m4 test_hnsw_m32 test_hnsw_ef; do
    mysql -h0 -P9306 -e "INSERT INTO $table (id, content) VALUES (1, 'test document')" 2>/dev/null
done
echo "HNSW configurations test completed"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SELECT COUNT(*) FROM test_hnsw_m4"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SELECT COUNT(*) FROM test_hnsw_m32"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SELECT COUNT(*) FROM test_hnsw_ef"
––– output –––
OK
––– input –––
for i in {1..10}; do
    mysql -h0 -P9306 -e "INSERT INTO test_hnsw_m4 (content) VALUES ('document number $i with various content')" 2>/dev/null
    mysql -h0 -P9306 -e "INSERT INTO test_hnsw_m32 (content) VALUES ('document number $i with various content')" 2>/dev/null
done
echo "Additional documents inserted for performance comparison"
––– output –––
OK
––– input –––
echo "Search with M=4 (faster, less connections):"
mysql -h0 -P9306 -E -e "SELECT id FROM test_hnsw_m4 WHERE KNN(vec, 3, 'document content') LIMIT 3"
––– output –––
OK
––– input –––
echo "Search with M=32 (slower, more connections, potentially more accurate):"
mysql -h0 -P9306 -E -e "SELECT id FROM test_hnsw_m32 WHERE KNN(vec, 3, 'document content') LIMIT 3"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_invalid_metric (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='invalid_metric'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
) engine='columnar'" 2>&1
––– output –––
OK
test/clt-tests/mcl/auto-embeddings-endpoints.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
apt-get install jq -y > /dev/null; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE emb_test (
    id BIGINT,
    title TEXT,
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='title, content'
)"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO emb_test (id, title, content) VALUES
    (1, 'machine learning', 'neural networks and deep learning'),
    (2, 'computer vision', 'image recognition and processing'),
    (3, 'natural language', 'text analysis and understanding')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK emb_test; OPTIMIZE TABLE emb_test OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM emb_test WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
+----------+
| count(*) |
+----------+
- |        3 |
+ |        2 |
+----------+
––– input –––
curl -s "http://localhost:9308/cli?select%20id,%20title%20from%20emb_test%20where%20knn(vec,%202,%20'artificial%20intelligence')" | grep -v 'rows in set'
––– output –––
+----+------------------+
| id | title            |
+----+------------------+
| 1  | machine learning |
| 2  | computer vision  |
- | 3  | natural language |
+ +----+------------------+
- +----+------------------+
––– input –––
curl -s "http://localhost:9308/cli_json?select%20id,%20title,%20@knn_dist%20from%20emb_test%20where%20knn(vec,%201,%20'learning')" | jq -r '.[0].data[0] | "ID: \(.id)\nTitle: \(.title)\nDistance: \(.["@knn_dist"] | tostring)"'
––– output –––
OK
––– input –––
curl -s -X POST "http://localhost:9308/sql?mode=raw" -d "select count(*) from emb_test where knn(vec, 2, 'neural networks')" | jq -r '.[0].data[0]."count(*)"'
––– output –––
- 3
+ 2
––– input –––
curl -s -X POST http://localhost:9308/insert -d '{"index":"emb_test","id":10,"doc":{"title":"quantum computing","content":"quantum algorithms"}}' | jq -r '.created'
––– output –––
OK
––– input –––
curl -s -X POST http://localhost:9308/search -d '{"index":"emb_test","knn":{"field":"vec","query":"quantum","k":1}}' | jq -r '.hits.hits[0]._source.title'
––– output –––
OK
––– input –––
curl -s -X POST http://localhost:9308/search -d '{"index":"emb_test","knn":{"field":"vec","query_text":"quantum","k":1}}'
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE chunk_test (
    id BIGINT,
    title TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='title'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO chunk_test (id, title) VALUES
    (1, 'machine learning'),
    (2, 'deep learning'),
    (3, 'reinforcement learning')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM chunk_test WHERE KNN(vec, 1, 'learning')"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK chunk_test; OPTIMIZE TABLE chunk_test OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM chunk_test WHERE KNN(vec, 1, 'learning')"
––– output –––
+----------+
| count(*) |
+----------+
- |        3 |
+ |        1 |
+----------+
test/clt-tests/mcl/auto-embeddings-json-api.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd $SEARCHD_FLAGS > /dev/null; if timeout 10 grep -qm1 '\[BUDDY\] started' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Buddy started!'; else echo 'Timeout or failed!'; cat /var/log/manticore/searchd.log;fi
––– output –––
OK
––– input –––
apt-get install jq -y > /dev/null; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_json_columnar (
    title TEXT,
    content TEXT,
    embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='title, content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SHOW CREATE TABLE test_json_columnar" | grep -o "knn_dims='384'"
––– output –––
OK
––– input –––
curl -s -X POST http://localhost:9308/insert -d '{"index":"test_json_columnar","id":1,"doc":{"title":"machine learning","content":"neural networks"}}' | jq -r 'if ._id then ._id else "inserted" end'
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_json_columnar WHERE KNN(embedding, 1, 'machine learning neural networks')"
––– output –––
OK
––– input –––
curl -s -X POST http://localhost:9308/bulk -H "Content-Type: application/x-ndjson" -d '
{"insert":{"index":"test_json_columnar","id":2,"doc":{"title":"computer vision","content":"image recognition"}}}
{"insert":{"index":"test_json_columnar","id":3,"doc":{"title":"NLP","content":"text processing"}}}
' | jq '{created: .items[0].bulk.created}'
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_json_columnar WHERE id IN (2,3)"
––– output –––
OK
––– input –––
curl -s -X POST http://localhost:9308/replace -d '{"index":"test_json_columnar","id":1,"doc":{"title":"updated ML","content":"updated networks"}}' | jq -r '.result'
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT title FROM test_json_columnar WHERE id=1 AND KNN(embedding, 1, 'updated ML networks')"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_json_columnar (id, title, content) VALUES (100, 'test', 'data')";
curl -s -X POST http://localhost:9308/insert -d '{"index":"test_json_columnar","id":101,"doc":{"title":"test","content":"data"}}' > /dev/null
––– output –––
OK
––– input –––
mysql -h0 -P9306 --batch --skip-column-names -e "SELECT embedding FROM test_json_columnar WHERE id=100" > /tmp/v1.txt
mysql -h0 -P9306 --batch --skip-column-names -e "SELECT embedding FROM test_json_columnar WHERE id=101" > /tmp/v2.txt
diff -q /tmp/v1.txt /tmp/v2.txt > /dev/null && echo "Vectors identical" || echo "Vectors differ"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_json_columnar"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_json_columnar; OPTIMIZE TABLE test_json_columnar OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
VECTOR=$(python3 -c "print(','.join(['0.01']*384))")
curl -s -X POST http://localhost:9308/search -d "{\"index\":\"test_json_columnar\",\"knn\":{\"field\":\"embedding\",\"query_vector\":[$VECTOR],\"k\":2}}" | jq -r '.hits.total // "0"'
––– output –––
- 5
+ 2
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE no_auto_embed (title TEXT, vec FLOAT_VECTOR KNN_TYPE='hnsw' KNN_DIMS='384' HNSW_SIMILARITY='l2') engine='columnar'"
––– output –––
OK
––– input –––
VECTOR=$(python3 -c "print(','.join(['0.5']*384))")
curl -s -X POST http://localhost:9308/insert -d "{\"index\":\"no_auto_embed\",\"id\":1,\"doc\":{\"title\":\"test\",\"vec\":[$VECTOR]}}" | jq -r 'if ._id then ._id else "inserted" end'
––– output –––
OK
––– input –––
QUERY_VEC=$(python3 -c "print(','.join(['0.5']*384))")
curl -s -X POST http://localhost:9308/search -d "{\"index\":\"no_auto_embed\",\"knn\":{\"field\":\"vec\",\"query_vector\":[$QUERY_VEC],\"k\":1}}" | jq -r '.hits.total // "0"'
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_json_rowwise (
    title TEXT,
    content TEXT,
    embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='title, content'
) engine='rowwise'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SHOW CREATE TABLE test_json_rowwise" | grep -o "knn_dims='384'"
––– output –––
OK
––– input –––
curl -s -X POST http://localhost:9308/insert -d '{"index":"test_json_rowwise","id":1,"doc":{"title":"machine learning","content":"neural networks"}}' | jq -r 'if ._id then ._id else "inserted" end'
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_json_rowwise WHERE KNN(embedding, 1, 'machine learning neural networks')"
––– output –––
OK
––– input –––
curl -s -X POST http://localhost:9308/bulk -H "Content-Type: application/x-ndjson" -d '
{"insert":{"index":"test_json_rowwise","id":2,"doc":{"title":"computer vision","content":"image recognition"}}}
{"insert":{"index":"test_json_rowwise","id":3,"doc":{"title":"NLP","content":"text processing"}}}
' | jq '{created: .items[0].bulk.created}'
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_json_rowwise WHERE id IN (2,3)"
––– output –––
OK
––– input –––
curl -s -X POST http://localhost:9308/replace -d '{"index":"test_json_rowwise","id":1,"doc":{"title":"updated ML","content":"updated networks"}}' | jq -r '.result'
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT title FROM test_json_rowwise WHERE id=1 AND KNN(embedding, 1, 'updated ML networks')"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_json_rowwise (id, title, content) VALUES (100, 'test', 'data')";
curl -s -X POST http://localhost:9308/insert -d '{"index":"test_json_rowwise","id":101,"doc":{"title":"test","content":"data"}}' > /dev/null
––– output –––
OK
––– input –––
mysql -h0 -P9306 --batch --skip-column-names -e "SELECT embedding FROM test_json_rowwise WHERE id=100" > /tmp/v1.txt
mysql -h0 -P9306 --batch --skip-column-names -e "SELECT embedding FROM test_json_rowwise WHERE id=101" > /tmp/v2.txt
diff -q /tmp/v1.txt /tmp/v2.txt > /dev/null && echo "Vectors identical" || echo "Vectors differ"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_json_rowwise"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_json_rowwise; OPTIMIZE TABLE test_json_rowwise OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
VECTOR=$(python3 -c "print(','.join(['0.01']*384))")
curl -s -X POST http://localhost:9308/search -d "{\"index\":\"test_json_rowwise\",\"knn\":{\"field\":\"embedding\",\"query_vector\":[$VECTOR],\"k\":2}}" | jq -r '.hits.total // "0"'
––– output –––
- 5
+ 2
test/clt-tests/mcl/auto-embeddings-dml-test.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd $SEARCHD_FLAGS > /dev/null; if timeout 10 grep -qm1 '\[BUDDY\] started' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Buddy started!'; else echo 'Timeout or failed!'; cat /var/log/manticore/searchd.log;fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_delete ( title TEXT, embedding_vector FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2' FROM='title' ) engine='rowwise'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_delete (id, title) VALUES (1, 'One')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DELETE FROM test_delete WHERE id = 1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_delete"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_delete (id, title) VALUES (2,'Two'),(3,'Three'),(4,'Four'),(5,'Five')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DELETE FROM test_delete WHERE id IN (2,3,4)"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_delete"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DROP TABLE IF EXISTS test_replace"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_replace ( title TEXT, price INTEGER, vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2' FROM='title' ) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_replace (id, title, price) VALUES (1, 'Original', 100)"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "REPLACE INTO test_replace (id, title, price) VALUES (1, 'Updated', 200)"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT title, price FROM test_replace WHERE id = 1"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DROP TABLE IF EXISTS test_vector_regen"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_vector_regen ( content TEXT, vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='cosine' MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2' FROM='content')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_vector_regen (id, content) VALUES (1,'AI and ML'),(2,'Deep Learning'),(3,'Cooking')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_vector_regen; OPTIMIZE TABLE test_vector_regen OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_vector_regen WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
+------+
| id   |
+------+
|    1 |
|    2 |
- |    3 |
+ +------+
- +------+
––– input –––
mysql -h0 -P9306 -e "REPLACE INTO test_vector_regen (id, content) VALUES (1, 'Cooking recipes')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_vector_regen; OPTIMIZE TABLE test_vector_regen OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_vector_regen WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
+------+
| id   |
+------+
|    2 |
|    3 |
- |    1 |
+ +------+
- +------+
––– input –––
mysql -h0 -P9306 -e "TRUNCATE TABLE test_vector_regen"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_vector_regen"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DROP TABLE IF EXISTS test_bulk"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_bulk ( content TEXT, vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2' FROM='content' ) engine='rowwise'"; echo $?
––– output –––
OK
––– input –––
for i in {1..50}; do
    mysql -h0 -P9306 -e "INSERT INTO test_bulk (id, content) VALUES ($i, 'Document $i')" 2>/dev/null
done
echo "Inserted 50 records"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DELETE FROM test_bulk WHERE id <= 10"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DELETE FROM test_bulk WHERE id >= 40"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_bulk"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DROP TABLE IF EXISTS test_multi_vec"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_multi_vec ( title TEXT, description TEXT, title_vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2' FROM='title', desc_vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='cosine' MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2' FROM='description' ) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_multi_vec (id, title, description) VALUES (1,'Title1','Desc1'),(2,'Title2','Desc2')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DELETE FROM test_multi_vec WHERE id=1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "REPLACE INTO test_multi_vec (id, title, description) VALUES (2,'NewTitle','NewDesc')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id, title FROM test_multi_vec"
––– output –––
OK
test/clt-tests/mcl/auto-embeddings-voyage-remote.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
cosine_similarity() {
    local file1="$1" file2="$2"

    awk '
    NR==FNR { a[NR]=$1; suma2+=$1*$1; next }
    {
        dot += a[FNR]*$1
        sumb2 += $1*$1
    }
    END {
        print dot / (sqrt(suma2) * sqrt(sumb2))
    }' "$file1" "$file2"
}
––– output –––
OK
––– input –––
export -f cosine_similarity
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_invalid_model (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/invalid-model-name-12345' FROM = 'title') " 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_valid_model_no_api_key (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/voyage-3.5-lite' FROM = 'title') " 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_voyage_remote (title TEXT, content TEXT, description TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/voyage-3.5-lite' FROM = 'title, content' API_KEY='${VOYAGE_API_KEY}') "; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SHOW CREATE TABLE test_voyage_remote"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_voyage_remote (id, title, content, description) VALUES(1, 'machine learning algorithms', 'deep neural networks and artificial intelligence', 'advanced AI research')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as record_count FROM test_voyage_remote WHERE id=1"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_voyage_remote (id, title, content, description) VALUES(2, 'machine learning algorithms', 'deep neural networks and artificial intelligence', 'different description')"

mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_remote WHERE id=1" | \
    grep -v embedding | \
    sed 's/[0-9]\+\(\.[0-9]\+\)\?/\n&\n/g' | \
    grep -E '^[0-9]+(\.[0-9]+)?$' | \
    awk '{printf "%.5f\n", $1}' > /tmp/vector1.txt

mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_remote WHERE id=2" | \
    grep -v embedding | \
    sed 's/[0-9]\+\(\.[0-9]\+\)\?/\n&\n/g' | \
    grep -E '^[0-9]+(\.[0-9]+)?$' | \
    awk '{printf "%.5f\n", $1}' > /tmp/vector2.txt

SIMILARITY=$(cosine_similarity /tmp/vector1.txt /tmp/vector2.txt)

echo "Cosine similarity: $SIMILARITY"

RESULT=$(awk -v sim="$SIMILARITY" 'BEGIN {
    if (sim > 0.99)
        print "SUCCESS: Same FROM fields produce similar vectors (similarity: " sim ")"
    else
        print "FAIL: Different vectors (FROM does not include description field and should not change generated vector value) (similarity: " sim ")"
}')

echo "$RESULT"

rm -f /tmp/vector1.txt /tmp/vector2.txt
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_voyage_title_only (title TEXT, content TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/voyage-3.5-lite' FROM = 'title' API_KEY='${VOYAGE_API_KEY}') "; mysql -h0 -P9306 -e "INSERT INTO test_voyage_title_only (id, title, content) VALUES(1, 'machine learning algorithms', 'completely different content here')"; MD5_MULTI=$(mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_remote WHERE id=1" | grep -v embedding | md5sum | awk '{print $1}'); MD5_SINGLE=$(mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_title_only WHERE id=1" | grep -v embedding | md5sum | awk '{print $1}'); echo "multi_field_md5: $MD5_MULTI"; echo "single_field_md5: $MD5_SINGLE"; if [ "$MD5_MULTI" != "$MD5_SINGLE" ]; then echo "SUCCESS: Different FROM specifications produce different vectors"; else echo "INFO: FROM field comparison result"; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test__invalid_field (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/text-embedding-ada-002' FROM = 'nonexistent_field') " 2>&1
––– output –––
OK
––– input –––
if mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test_voyage_no_from'" | grep -q test_voyage_no_from; then mysql -h0 -P9306 -e "INSERT INTO test__no_from (id, title, embedding) VALUES(1, 'test title', '(0.1, 0.2, 0.3, 0.4, 0.5)')"; echo "insert_result: $?"; else echo "insert_result: skipped (table not created)"; fi
––– output –––
OK
––– input –––
if mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test__no_from'" | grep -q test_voyage_no_from; then mysql -h0 -P9306 -e "SHOW CREATE TABLE test_voyage_no_from"; else echo "table_structure: skipped (table not created)"; fi
––– output –––
OK
––– input –––
if [ -n "$VOYAGE_API_KEY" ] && [ "$VOYAGE_API_KEY" != "dummy_key_for_testing" ]; then echo "API key is available for testing"; else echo "API key not available - using dummy for error testing"; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id, knn_dist() FROM test_voyage_remote WHERE knn(embedding, 3, 'machine learning and artificial intelligence')\G"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as count FROM test_voyage_remote WHERE knn(embedding, 5, 'technology and AI') AND id > 0"
––– output –––
OK
––– input –––
API_KEY_VAL="${VOYAGE_API_KEY}"; cat > /etc/manticoresearch/manticore.conf << CONFEOF
searchd {
    listen = 127.0.0.1:9306:mysql41
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    pid_file = /var/run/manticore/searchd.pid
}

table test_voyage_plain {
    type = rt
    path = /var/lib/manticore/test_voyage_plain
    rt_field = title
    rt_field = content
    rt_attr_float_vector = embedding
    knn = {"attrs":[{"name":"embedding","type":"hnsw","hnsw_similarity":"L2","hnsw_m":16,"hnsw_ef_construction":200,"model_name":"voyage/voyage-3.5-lite","from":"title,content","api_key":"${API_KEY_VAL}"}]}
}
CONFEOF
––– output –––
OK
––– input –––
searchd --stopwait --quiet
––– output –––
+ [Tue Jan 13 15:03:45.898 2026] [93] WARNING: Error initializing secondary index: daemon requires secondary library v18 (trying to load v19)
+ [Tue Jan 13 15:03:45.898 2026] [93] FATAL: malformed or unknown option near '--quiet'; use '-h' or '--help' to see available options.
+ Manticore 0.0.0 0240e6481@25121214 (columnar 0.0.0 e5dff86@26011314) (knn 0.0.0 e5dff86@26011314) (embeddings 1.1.0 unknown@00000000)
+ Copyright (c) 2001-2016, Andrew Aksyonoff
+ Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
+ Copyright (c) 2017-2025, Manticore Software LTD (https://manticoresearch.com)
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SHOW TABLES"
––– output –––
- +-------------------+------+
- | Table             | Type |
- +-------------------+------+
- | test_voyage_plain | rt   |
- +-------------------+------+
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_voyage_plain (id, title, content) VALUES(1, 'bread', 'food item'), (2, 'cat', 'animal pet')"; echo $?
––– output –––
- 0
+ ERROR 1064 (42000) at line 1: Cannot create the table automatically in Plain mode. Make sure the table exists before inserting into it
+ 1
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as count FROM test_voyage_plain"
––– output –––
- +-------+
+ ERROR 1064 (42000) at line 1: unknown local table(s) 'test_voyage_plain' in search request
- | count |
- +-------+
- |     2 |
- +-------+
––– input –––
mysql -h0 -P9306 -E -e "SELECT id, title FROM test_voyage_plain WHERE knn(embedding, 2, 'dog')"
––– output –––
- *************************** 1. row ***************************
+ ERROR 1064 (42000) at line 1: unknown local table(s) 'test_voyage_plain' in search request
-    id: 2
- title: cat
- *************************** 2. row ***************************
-    id: 1
- title: bread
––– input –––
cat > /etc/manticoresearch/manticore.conf << 'EOF'
searchd {
    listen = 127.0.0.1:9306:mysql41
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    pid_file = /var/run/manticore/searchd.pid
}

table test_voyage_no_key {
    type = rt
    path = /var/lib/manticore/test_voyage_no_key
    rt_field = title
    rt_attr_float_vector = embedding
    knn = {"attrs":[{"name":"embedding","type":"hnsw","hnsw_similarity":"L2","model_name":"voyage/voyage-3.5-lite","from":"title"}]}
}
EOF
––– output –––
OK
––– input –––
searchd --stopwait --quiet
––– output –––
+ [Tue Jan 13 15:03:47.065 2026] [131] WARNING: Error initializing secondary index: daemon requires secondary library v18 (trying to load v19)
+ [Tue Jan 13 15:03:47.065 2026] [131] FATAL: malformed or unknown option near '--quiet'; use '-h' or '--help' to see available options.
+ Manticore 0.0.0 0240e6481@25121214 (columnar 0.0.0 e5dff86@26011314) (knn 0.0.0 e5dff86@26011314) (embeddings 1.1.0 unknown@00000000)
+ Copyright (c) 2001-2016, Andrew Aksyonoff
+ Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
+ Copyright (c) 2017-2025, Manticore Software LTD (https://manticoresearch.com)
––– input –––
searchd 2>&1|grep WARNING
––– output –––
- WARNING: table 'test_voyage_no_key': prealloc: Invalid API key for remote model - NOT SERVING
+ [Tue Jan 13 15:03:47.078 2026] [132] WARNING: Error initializing secondary index: daemon requires secondary library v18 (trying to load v19)
test/clt-tests/mcl/auto-embeddings-error-handling.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd $SEARCHD_FLAGS > /dev/null; if timeout 10 grep -qm1 '\[BUDDY\] started' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Buddy started!'; else echo 'Timeout or failed!'; cat /var/log/manticore/searchd.log;fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_dims (
    title TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' KNN_DIMS='384'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='title'
)" 2>&1
# Check if table was actually created
mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test_dims'" | grep -q "test_dims" && echo "Table created" || echo "Table not created"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_auto_dims (
    title TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='title'
)"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_auto_dims (id, title) VALUES (1, 'Test document')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_no_model (
    title TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    FROM='title'
)"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_no_from (
    content_text TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
)" 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_empty_from (
    title TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM=''
)" 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_bad_model (
    title TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='non-existent-model/invalid-name'
    FROM='title'
)" 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_no_prefix (
    content_text TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='all-MiniLM-L6-v2'
    FROM='content_text'
)" 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_bad_from (
    title TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='non_existent_field'
)"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_circular (
    title TEXT,
    vec1 FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='vec1'
)" 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_empty (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_empty (id, content) VALUES (1, '')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_empty (id, content) VALUES (2, NULL)" 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_empty"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_rowwise (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
) engine='rowwise'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_rowwise (id, content) VALUES
    (1, 'machine learning'),
    (2, 'deep learning')"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_rowwise; OPTIMIZE TABLE test_rowwise OPTION sync=1, cutoff=1"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_rowwise WHERE KNN(vec, 1, 'artificial intelligence')"
––– output –––
+------+
| id   |
+------+
|    1 |
- |    2 |
+ +------+
- +------+
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_vec_columnar (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' engine='columnar'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
)"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_vec_columnar (id, content) VALUES
    (1, 'machine learning'),
    (2, 'deep learning')"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_vec_columnar; OPTIMIZE TABLE test_vec_columnar OPTION sync=1, cutoff=1"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_vec_columnar WHERE KNN(vec, 1, 'artificial intelligence')"
––– output –––
+------+
| id   |
+------+
|    1 |
- |    2 |
+ +------+
- +------+
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_full_columnar (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_full_columnar (id, content) VALUES
    (1, 'machine learning'),
    (2, 'deep learning')"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_full_columnar; OPTIMIZE TABLE test_full_columnar OPTION sync=1, cutoff=1"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_full_columnar WHERE KNN(vec, 1, 'artificial intelligence')"
––– output –––
+------+
| id   |
+------+
|    1 |
- |    2 |
+ +------+
- +------+
––– input –––
echo "Row-wise (default):"
mysql -h0 -P9306 -e "SHOW CREATE TABLE test_rowwise\G" | grep -E "vec.*float_vector"
––– output –––
OK
––– input –––
echo "Vec columnar only:"
mysql -h0 -P9306 -e "SHOW CREATE TABLE test_vec_columnar\G" | grep -E "vec.*float_vector"
––– output –––
OK
––– input –––
echo "Full columnar:"
mysql -h0 -P9306 -e "SHOW CREATE TABLE test_full_columnar\G" | grep -E "(vec.*float_vector|engine='columnar')"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT * FROM test_auto_dims WHERE KNN(wrong_field, 1, 'test')" 2>&1 | grep -o "wrong_field.*not found"
––– output –––
OK
test/clt-tests/mcl/auto-embeddings-openai-remote.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
cosine_similarity() {
    local file1="$1" file2="$2"

    awk '
    NR==FNR { a[NR]=$1; suma2+=$1*$1; next }
    {
        dot += a[FNR]*$1
        sumb2 += $1*$1
    }
    END {
        print dot / (sqrt(suma2) * sqrt(sumb2))
    }' "$file1" "$file2"
}
––– output –––
OK
––– input –––
export -f cosine_similarity
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_invalid_model (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'openai/invalid-model-name-12345' FROM = 'title') " 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_valid_model_no_api_key (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'openai/text-embedding-ada-002' FROM = 'title') " 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_openai_remote (title TEXT, content TEXT, description TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'openai/text-embedding-ada-002' FROM = 'title, content' API_KEY='${OPENAI_API_KEY}') "; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SHOW CREATE TABLE test_openai_remote"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_openai_remote (id, title, content, description) VALUES(1, 'machine learning algorithms', 'deep neural networks and artificial intelligence', 'advanced AI research')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as record_count FROM test_openai_remote WHERE id=1"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_openai_remote (id, title, content, description) VALUES(2, 'machine learning algorithms', 'deep neural networks and artificial intelligence', 'different description')"

mysql -h0 -P9306 -e "SELECT embedding FROM test_openai_remote WHERE id=1" | \
    grep -v embedding | \
    sed 's/[0-9]\+\(\.[0-9]\+\)\?/\n&\n/g' | \
    grep -E '^[0-9]+(\.[0-9]+)?$' | \
    awk '{printf "%.5f\n", $1}' > /tmp/vector1.txt

mysql -h0 -P9306 -e "SELECT embedding FROM test_openai_remote WHERE id=2" | \
    grep -v embedding | \
    sed 's/[0-9]\+\(\.[0-9]\+\)\?/\n&\n/g' | \
    grep -E '^[0-9]+(\.[0-9]+)?$' | \
    awk '{printf "%.5f\n", $1}' > /tmp/vector2.txt

SIMILARITY=$(cosine_similarity /tmp/vector1.txt /tmp/vector2.txt)

echo "Cosine similarity: $SIMILARITY"

RESULT=$(awk -v sim="$SIMILARITY" 'BEGIN {
    if (sim > 0.99)
        print "SUCCESS: Same FROM fields produce similar vectors (similarity: " sim ")"
    else
        print "FAIL: Different vectors (FROM does not include description field and should not change generated vector value) (similarity: " sim ")"
}')

echo "$RESULT"

rm -f /tmp/vector1.txt /tmp/vector2.txt
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_openai_title_only (title TEXT, content TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'openai/text-embedding-ada-002' FROM = 'title' API_KEY='${OPENAI_API_KEY}') "; mysql -h0 -P9306 -e "INSERT INTO test_openai_title_only (id, title, content) VALUES(1, 'machine learning algorithms', 'completely different content here')"; MD5_MULTI=$(mysql -h0 -P9306 -e "SELECT embedding FROM test_openai_remote WHERE id=1" | grep -v embedding | md5sum | awk '{print $1}'); MD5_SINGLE=$(mysql -h0 -P9306 -e "SELECT embedding FROM test_openai_title_only WHERE id=1" | grep -v embedding | md5sum | awk '{print $1}'); echo "multi_field_md5: $MD5_MULTI"; echo "single_field_md5: $MD5_SINGLE"; if [ "$MD5_MULTI" != "$MD5_SINGLE" ]; then echo "SUCCESS: Different FROM specifications produce different vectors"; else echo "INFO: FROM field comparison result"; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_openai_invalid_field (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'openai/text-embedding-ada-002' FROM = 'nonexistent_field') " 2>&1
––– output –––
OK
––– input –––
if mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test_openai_no_from'" | grep -q test_openai_no_from; then mysql -h0 -P9306 -e "INSERT INTO test_openai_no_from (id, title, embedding) VALUES(1, 'test title', '(0.1, 0.2, 0.3, 0.4, 0.5)')"; echo "insert_result: $?"; else echo "insert_result: skipped (table not created)"; fi
––– output –––
OK
––– input –––
if mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test_openai_no_from'" | grep -q test_openai_no_from; then mysql -h0 -P9306 -e "SHOW CREATE TABLE test_openai_no_from"; else echo "table_structure: skipped (table not created)"; fi
––– output –––
OK
––– input –––
if [ -n "$OPENAI_API_KEY" ] && [ "$OPENAI_API_KEY" != "dummy_key_for_testing" ]; then echo "API key is available for testing"; else echo "API key not available - using dummy for error testing"; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id, knn_dist() FROM test_openai_remote WHERE knn(embedding, 3, 'machine learning and artificial intelligence')\G"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as count FROM test_openai_remote WHERE knn(embedding, 5, 'technology and AI') AND id > 0"
––– output –––
OK
––– input –––
API_KEY_VAL="${OPENAI_API_KEY}"; cat > /etc/manticoresearch/manticore.conf << CONFEOF
searchd {
    listen = 127.0.0.1:9306:mysql41
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    pid_file = /var/run/manticore/searchd.pid
}

table test_openai_plain {
    type = rt
    path = /var/lib/manticore/test_openai_plain
    rt_field = title
    rt_field = content
    rt_attr_float_vector = embedding
    knn = {"attrs":[{"name":"embedding","type":"hnsw","hnsw_similarity":"L2","hnsw_m":16,"hnsw_ef_construction":200,"model_name":"openai/text-embedding-ada-002","from":"title,content","api_key":"${API_KEY_VAL}"}]}
}
CONFEOF
––– output –––
OK
––– input –––
searchd --stopwait --quiet
––– output –––
+ [Tue Jan 13 15:03:00.832 2026] [93] WARNING: Error initializing secondary index: daemon requires secondary library v18 (trying to load v19)
+ [Tue Jan 13 15:03:00.833 2026] [93] FATAL: malformed or unknown option near '--quiet'; use '-h' or '--help' to see available options.
+ Manticore 0.0.0 0240e6481@25121214 (columnar 0.0.0 e5dff86@26011314) (knn 0.0.0 e5dff86@26011314) (embeddings 1.1.0 unknown@00000000)
+ Copyright (c) 2001-2016, Andrew Aksyonoff
+ Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
+ Copyright (c) 2017-2025, Manticore Software LTD (https://manticoresearch.com)
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SHOW TABLES"
––– output –––
- +-------------------+------+
- | Table             | Type |
- +-------------------+------+
- | test_openai_plain | rt   |
- +-------------------+------+
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_openai_plain (id, title, content) VALUES(1, 'bread', 'food item'), (2, 'cat', 'animal pet')"; echo $?
––– output –––
- 0
+ ERROR 1064 (42000) at line 1: Cannot create the table automatically in Plain mode. Make sure the table exists before inserting into it
+ 1
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as count FROM test_openai_plain"
––– output –––
- +-------+
+ ERROR 1064 (42000) at line 1: unknown local table(s) 'test_openai_plain' in search request
- | count |
- +-------+
- |     2 |
- +-------+
––– input –––
mysql -h0 -P9306 -E -e "SELECT id, title FROM test_openai_plain WHERE knn(embedding, 2, 'dog')"
––– output –––
- *************************** 1. row ***************************
+ ERROR 1064 (42000) at line 1: unknown local table(s) 'test_openai_plain' in search request
-    id: 2
- title: cat
- *************************** 2. row ***************************
-    id: 1
- title: bread
––– input –––
cat > /etc/manticoresearch/manticore.conf << 'EOF'
searchd {
    listen = 127.0.0.1:9306:mysql41
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    pid_file = /var/run/manticore/searchd.pid
}

table test_openai_no_key {
    type = rt
    path = /var/lib/manticore/test_openai_no_key
    rt_field = title
    rt_attr_float_vector = embedding
    knn = {"attrs":[{"name":"embedding","type":"hnsw","hnsw_similarity":"L2","model_name":"openai/text-embedding-ada-002","from":"title"}]}
}
EOF
––– output –––
OK
––– input –––
searchd --stopwait --quiet
––– output –––
+ [Tue Jan 13 15:03:01.508 2026] [131] WARNING: Error initializing secondary index: daemon requires secondary library v18 (trying to load v19)
+ [Tue Jan 13 15:03:01.508 2026] [131] FATAL: malformed or unknown option near '--quiet'; use '-h' or '--help' to see available options.
+ Manticore 0.0.0 0240e6481@25121214 (columnar 0.0.0 e5dff86@26011314) (knn 0.0.0 e5dff86@26011314) (embeddings 1.1.0 unknown@00000000)
+ Copyright (c) 2001-2016, Andrew Aksyonoff
+ Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
+ Copyright (c) 2017-2025, Manticore Software LTD (https://manticoresearch.com)
––– input –––
searchd 2>&1|grep WARNING
––– output –––
- WARNING: table 'test_openai_no_key': prealloc: Invalid API key for remote model - NOT SERVING
+ [Tue Jan 13 15:03:01.522 2026] [132] WARNING: Error initializing secondary index: daemon requires secondary library v18 (trying to load v19)

@github-actions
Copy link

clt

❌ CLT tests in test/clt-tests/mcl/
✅ OK: 13
❌ Failed: 8
⏳ Duration: 492s
👉 Check Action Results for commit e5dff86

Failed tests:

🔧 Edit failed tests in UI:

test/clt-tests/mcl/auto-embeddings-backup-restore.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd $SEARCHD_FLAGS > /dev/null; if timeout 10 grep -qm1 '\[BUDDY\] started' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Buddy started!'; else echo 'Timeout or failed!'; cat /var/log/manticore/searchd.log;fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_backup (
    title TEXT,
    content TEXT,
    status INTEGER,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='title, content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_backup (id, title, content, status) VALUES
    (1, 'machine learning', 'neural networks', 1),
    (2, 'deep learning', 'transformers', 1),
    (3, 'computer vision', 'image processing', 2)"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_backup; OPTIMIZE TABLE test_backup OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_backup WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
+------+
| id   |
+------+
|    1 |
|    3 |
- |    2 |
+ +------+
- +------+
––– input –––
mysql -h0 -P9306 -E -e "SELECT id, title, content, KNN_DIST() as distance FROM test_backup WHERE KNN(vec, 3, 'artificial intelligence') ORDER BY distance"
––– output –––
OK
––– input –––
manticore-backup --version | grep -c "Manticore Backup"
––– output –––
OK
––– input –––
mkdir -p /tmp/backup && chmod 777 /tmp/backup; echo $?
––– output –––
OK
––– input –––
manticore-backup --backup-dir=/tmp/backup --tables=test_backup 2>&1 | grep -c "Backing up table"
––– output –––
OK
––– input –––
ls -d /tmp/backup/backup-* | wc -l
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FREEZE test_backup"
––– output –––
+-----------------------------------------------------+-----------------------------------------------------+
| file                                                | normalized                                          |
+-----------------------------------------------------+-----------------------------------------------------+
| /var/lib/manticore/test_backup/test_backup.0.spc    | /var/lib/manticore/test_backup/test_backup.0.spc    |
| /var/lib/manticore/test_backup/test_backup.0.spd    | /var/lib/manticore/test_backup/test_backup.0.spd    |
| /var/lib/manticore/test_backup/test_backup.0.spds   | /var/lib/manticore/test_backup/test_backup.0.spds   |
| /var/lib/manticore/test_backup/test_backup.0.spe    | /var/lib/manticore/test_backup/test_backup.0.spe    |
| /var/lib/manticore/test_backup/test_backup.0.sph    | /var/lib/manticore/test_backup/test_backup.0.sph    |
| /var/lib/manticore/test_backup/test_backup.0.sphi   | /var/lib/manticore/test_backup/test_backup.0.sphi   |
| /var/lib/manticore/test_backup/test_backup.0.spi    | /var/lib/manticore/test_backup/test_backup.0.spi    |
- | /var/lib/manticore/test_backup/test_backup.0.spidx  | /var/lib/manticore/test_backup/test_backup.0.spidx  |
+ | /var/lib/manticore/test_backup/test_backup.0.spknn  | /var/lib/manticore/test_backup/test_backup.0.spknn  |
- | /var/lib/manticore/test_backup/test_backup.0.spknn  | /var/lib/manticore/test_backup/test_backup.0.spknn  |
+ | /var/lib/manticore/test_backup/test_backup.0.spm    | /var/lib/manticore/test_backup/test_backup.0.spm    |
- | /var/lib/manticore/test_backup/test_backup.0.spm    | /var/lib/manticore/test_backup/test_backup.0.spm    |
+ | /var/lib/manticore/test_backup/test_backup.0.spp    | /var/lib/manticore/test_backup/test_backup.0.spp    |
- | /var/lib/manticore/test_backup/test_backup.0.spp    | /var/lib/manticore/test_backup/test_backup.0.spp    |
+ | /var/lib/manticore/test_backup/test_backup.0.spt    | /var/lib/manticore/test_backup/test_backup.0.spt    |
- | /var/lib/manticore/test_backup/test_backup.0.spt    | /var/lib/manticore/test_backup/test_backup.0.spt    |
+ | /var/lib/manticore/test_backup/test_backup.meta     | /var/lib/manticore/test_backup/test_backup.meta     |
- | /var/lib/manticore/test_backup/test_backup.meta     | /var/lib/manticore/test_backup/test_backup.meta     |
+ | /var/lib/manticore/test_backup/test_backup.settings | /var/lib/manticore/test_backup/test_backup.settings |
- | /var/lib/manticore/test_backup/test_backup.settings | /var/lib/manticore/test_backup/test_backup.settings |
+ +-----------------------------------------------------+-----------------------------------------------------+
- +-----------------------------------------------------+-----------------------------------------------------+
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_backup"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_backup (id, title, content, status) VALUES (4, 'frozen insert', 'test data', 3)"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "UNFREEZE test_backup"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_backup"
––– output –––
OK
––– input –––
mysqldump -h0 -P9306 manticore test_backup > /tmp/logical_backup.sql 2>/dev/null; echo $?
––– output –––
OK
––– input –––
grep -c "INSERT INTO" /tmp/logical_backup.sql
––– output –––
OK
––– input –––
searchd --stopwait > /dev/null 2>&1; echo $?
––– output –––
OK
––– input –––
rm -f /etc/manticoresearch/manticore.conf; rm -rf /var/lib/manticore/*; echo "Cleaned for restore"
––– output –––
OK
––– input –––
manticore-backup --backup-dir=/tmp/backup --restore 2>&1 | grep -c "backup-"
––– output –––
OK
––– input –––
BACKUP_NAME=$(manticore-backup --backup-dir=/tmp/backup --restore 2>&1 | grep backup- | awk '{print $1}' | head -1)
manticore-backup --backup-dir=/tmp/backup --restore=$BACKUP_NAME 2>&1 | grep -c "Starting to restore"
––– output –––
- 1
+ 0
––– input –––
searchd > /dev/null 2>&1; echo $?
––– output –––
- 0
+ 1
––– input –––
echo "Waiting for searchd to start"; sleep 3
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_backup"
––– output –––
- +----------+
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
- | count(*) |
- +----------+
- |        3 |
- +----------+
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_backup; OPTIMIZE TABLE test_backup OPTION sync=1, cutoff=1"; echo $?
––– output –––
- 0
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
+ 1
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_backup WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
- +------+
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
- | id   |
- +------+
- |    1 |
- |    3 |
- |    2 |
- +------+
––– input –––
mysql -h0 -P9306 -e "ALTER TABLE test_backup ADD COLUMN new_field INTEGER"; echo $?
––– output –––
- 0
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
+ 1
––– input –––
mysql -h0 -P9306 -e "DESC test_backup" | grep "new_field"
––– output –––
- | new_field | uint         | columnar fast_fetch     |
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_copy (
    title TEXT,
    content TEXT,
    status INTEGER,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='title, content'
) engine='columnar'"; echo $?
––– output –––
- 0
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
+ 1
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_copy (id, title, content, status) VALUES
    (1, 'machine learning', 'neural networks', 1),
    (2, 'deep learning', 'transformers', 1),
    (3, 'computer vision', 'image processing', 2),
    (4, 'frozen insert', 'test data', 3)"; echo $?
––– output –––
- 0
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
+ 1
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_copy"
––– output –––
- +----------+
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
- | count(*) |
- +----------+
- |        4 |
- +----------+
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_copy WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
- +------+
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
- | id   |
- +------+
- |    1 |
- |    3 |
- |    2 |
- |    4 |
- +------+
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_copy; OPTIMIZE TABLE test_copy OPTION sync=1, cutoff=1"; echo $?
––– output –––
- 0
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
+ 1
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_copy WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
- +------+
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
- | id   |
- +------+
- |    1 |
- |    3 |
- |    2 |
- |    4 |
- +------+
test/clt-tests/mcl/auto-embeddings-hnsw-configs.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd $SEARCHD_FLAGS > /dev/null; if timeout 10 grep -qm1 '\[BUDDY\] started' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Buddy started!'; else echo 'Timeout or failed!'; cat /var/log/manticore/searchd.log;fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_l2 (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_cosine (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='cosine'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_ip (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='ip'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
for table in test_l2 test_cosine test_ip; do
    mysql -h0 -P9306 -e "INSERT INTO $table (id, content) VALUES
        (1, 'machine learning'),
        (2, 'deep learning'),
        (3, 'cooking recipes')" 2>/dev/null
done
echo "Data inserted into all tables"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_l2; OPTIMIZE TABLE test_l2 OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_cosine; OPTIMIZE TABLE test_cosine OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_ip; OPTIMIZE TABLE test_ip OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SELECT id FROM test_l2 WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
*************************** 1. row ***************************
id: 1
*************************** 2. row ***************************
id: 2
- *************************** 3. row ***************************
- id: 3
––– input –––
mysql -h0 -P9306 -E -e "SELECT id FROM test_cosine WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
*************************** 1. row ***************************
id: 1
*************************** 2. row ***************************
id: 2
- *************************** 3. row ***************************
- id: 3
––– input –––
mysql -h0 -P9306 -E -e "SELECT id FROM test_ip WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
*************************** 1. row ***************************
id: 1
*************************** 2. row ***************************
id: 2
- *************************** 3. row ***************************
- id: 3
––– input –––
echo "L2 (Euclidean) distances:"
mysql -h0 -P9306 -E -e "SELECT id, content, KNN_DIST() as distance FROM test_l2 WHERE KNN(vec, 3, 'neural networks') ORDER BY distance"
––– output –––
OK
––– input –––
echo "Cosine similarity distances (smaller = more similar):"
mysql -h0 -P9306 -E -e "SELECT id, content, KNN_DIST() as distance FROM test_cosine WHERE KNN(vec, 3, 'neural networks') ORDER BY distance"
––– output –––
OK
––– input –––
echo "Inner product distances (smaller = more similar):"
mysql -h0 -P9306 -E -e "SELECT id, content, KNN_DIST() as distance FROM test_ip WHERE KNN(vec, 3, 'neural networks') ORDER BY distance"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_hnsw_m4 (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    HNSW_M='4'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_hnsw_m32 (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    HNSW_M='32'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_hnsw_ef (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    HNSW_M='16'
    HNSW_EF_CONSTRUCTION='500'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
for table in test_hnsw_m4 test_hnsw_m32 test_hnsw_ef; do
    mysql -h0 -P9306 -e "INSERT INTO $table (id, content) VALUES (1, 'test document')" 2>/dev/null
done
echo "HNSW configurations test completed"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SELECT COUNT(*) FROM test_hnsw_m4"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SELECT COUNT(*) FROM test_hnsw_m32"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SELECT COUNT(*) FROM test_hnsw_ef"
––– output –––
OK
––– input –––
for i in {1..10}; do
    mysql -h0 -P9306 -e "INSERT INTO test_hnsw_m4 (content) VALUES ('document number $i with various content')" 2>/dev/null
    mysql -h0 -P9306 -e "INSERT INTO test_hnsw_m32 (content) VALUES ('document number $i with various content')" 2>/dev/null
done
echo "Additional documents inserted for performance comparison"
––– output –––
OK
––– input –––
echo "Search with M=4 (faster, less connections):"
mysql -h0 -P9306 -E -e "SELECT id FROM test_hnsw_m4 WHERE KNN(vec, 3, 'document content') LIMIT 3"
––– output –––
OK
––– input –––
echo "Search with M=32 (slower, more connections, potentially more accurate):"
mysql -h0 -P9306 -E -e "SELECT id FROM test_hnsw_m32 WHERE KNN(vec, 3, 'document content') LIMIT 3"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_invalid_metric (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='invalid_metric'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
) engine='columnar'" 2>&1
––– output –––
OK
test/clt-tests/mcl/auto-embeddings-endpoints.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
apt-get install jq -y > /dev/null; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE emb_test (
    id BIGINT,
    title TEXT,
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='title, content'
)"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO emb_test (id, title, content) VALUES
    (1, 'machine learning', 'neural networks and deep learning'),
    (2, 'computer vision', 'image recognition and processing'),
    (3, 'natural language', 'text analysis and understanding')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK emb_test; OPTIMIZE TABLE emb_test OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM emb_test WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
+----------+
| count(*) |
+----------+
- |        3 |
+ |        2 |
+----------+
––– input –––
curl -s "http://localhost:9308/cli?select%20id,%20title%20from%20emb_test%20where%20knn(vec,%202,%20'artificial%20intelligence')" | grep -v 'rows in set'
––– output –––
+----+------------------+
| id | title            |
+----+------------------+
| 1  | machine learning |
| 2  | computer vision  |
- | 3  | natural language |
+ +----+------------------+
- +----+------------------+
––– input –––
curl -s "http://localhost:9308/cli_json?select%20id,%20title,%20@knn_dist%20from%20emb_test%20where%20knn(vec,%201,%20'learning')" | jq -r '.[0].data[0] | "ID: \(.id)\nTitle: \(.title)\nDistance: \(.["@knn_dist"] | tostring)"'
––– output –––
OK
––– input –––
curl -s -X POST "http://localhost:9308/sql?mode=raw" -d "select count(*) from emb_test where knn(vec, 2, 'neural networks')" | jq -r '.[0].data[0]."count(*)"'
––– output –––
- 3
+ 2
––– input –––
curl -s -X POST http://localhost:9308/insert -d '{"index":"emb_test","id":10,"doc":{"title":"quantum computing","content":"quantum algorithms"}}' | jq -r '.created'
––– output –––
OK
––– input –––
curl -s -X POST http://localhost:9308/search -d '{"index":"emb_test","knn":{"field":"vec","query":"quantum","k":1}}' | jq -r '.hits.hits[0]._source.title'
––– output –––
OK
––– input –––
curl -s -X POST http://localhost:9308/search -d '{"index":"emb_test","knn":{"field":"vec","query_text":"quantum","k":1}}'
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE chunk_test (
    id BIGINT,
    title TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='title'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO chunk_test (id, title) VALUES
    (1, 'machine learning'),
    (2, 'deep learning'),
    (3, 'reinforcement learning')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM chunk_test WHERE KNN(vec, 1, 'learning')"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK chunk_test; OPTIMIZE TABLE chunk_test OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM chunk_test WHERE KNN(vec, 1, 'learning')"
––– output –––
+----------+
| count(*) |
+----------+
- |        3 |
+ |        1 |
+----------+
test/clt-tests/mcl/auto-embeddings-json-api.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd $SEARCHD_FLAGS > /dev/null; if timeout 10 grep -qm1 '\[BUDDY\] started' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Buddy started!'; else echo 'Timeout or failed!'; cat /var/log/manticore/searchd.log;fi
––– output –––
OK
––– input –––
apt-get install jq -y > /dev/null; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_json_columnar (
    title TEXT,
    content TEXT,
    embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='title, content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SHOW CREATE TABLE test_json_columnar" | grep -o "knn_dims='384'"
––– output –––
OK
––– input –––
curl -s -X POST http://localhost:9308/insert -d '{"index":"test_json_columnar","id":1,"doc":{"title":"machine learning","content":"neural networks"}}' | jq -r 'if ._id then ._id else "inserted" end'
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_json_columnar WHERE KNN(embedding, 1, 'machine learning neural networks')"
––– output –––
OK
––– input –––
curl -s -X POST http://localhost:9308/bulk -H "Content-Type: application/x-ndjson" -d '
{"insert":{"index":"test_json_columnar","id":2,"doc":{"title":"computer vision","content":"image recognition"}}}
{"insert":{"index":"test_json_columnar","id":3,"doc":{"title":"NLP","content":"text processing"}}}
' | jq '{created: .items[0].bulk.created}'
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_json_columnar WHERE id IN (2,3)"
––– output –––
OK
––– input –––
curl -s -X POST http://localhost:9308/replace -d '{"index":"test_json_columnar","id":1,"doc":{"title":"updated ML","content":"updated networks"}}' | jq -r '.result'
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT title FROM test_json_columnar WHERE id=1 AND KNN(embedding, 1, 'updated ML networks')"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_json_columnar (id, title, content) VALUES (100, 'test', 'data')";
curl -s -X POST http://localhost:9308/insert -d '{"index":"test_json_columnar","id":101,"doc":{"title":"test","content":"data"}}' > /dev/null
––– output –––
OK
––– input –––
mysql -h0 -P9306 --batch --skip-column-names -e "SELECT embedding FROM test_json_columnar WHERE id=100" > /tmp/v1.txt
mysql -h0 -P9306 --batch --skip-column-names -e "SELECT embedding FROM test_json_columnar WHERE id=101" > /tmp/v2.txt
diff -q /tmp/v1.txt /tmp/v2.txt > /dev/null && echo "Vectors identical" || echo "Vectors differ"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_json_columnar"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_json_columnar; OPTIMIZE TABLE test_json_columnar OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
VECTOR=$(python3 -c "print(','.join(['0.01']*384))")
curl -s -X POST http://localhost:9308/search -d "{\"index\":\"test_json_columnar\",\"knn\":{\"field\":\"embedding\",\"query_vector\":[$VECTOR],\"k\":2}}" | jq -r '.hits.total // "0"'
––– output –––
- 5
+ 2
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE no_auto_embed (title TEXT, vec FLOAT_VECTOR KNN_TYPE='hnsw' KNN_DIMS='384' HNSW_SIMILARITY='l2') engine='columnar'"
––– output –––
OK
––– input –––
VECTOR=$(python3 -c "print(','.join(['0.5']*384))")
curl -s -X POST http://localhost:9308/insert -d "{\"index\":\"no_auto_embed\",\"id\":1,\"doc\":{\"title\":\"test\",\"vec\":[$VECTOR]}}" | jq -r 'if ._id then ._id else "inserted" end'
––– output –––
OK
––– input –––
QUERY_VEC=$(python3 -c "print(','.join(['0.5']*384))")
curl -s -X POST http://localhost:9308/search -d "{\"index\":\"no_auto_embed\",\"knn\":{\"field\":\"vec\",\"query_vector\":[$QUERY_VEC],\"k\":1}}" | jq -r '.hits.total // "0"'
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_json_rowwise (
    title TEXT,
    content TEXT,
    embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='title, content'
) engine='rowwise'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SHOW CREATE TABLE test_json_rowwise" | grep -o "knn_dims='384'"
––– output –––
OK
––– input –––
curl -s -X POST http://localhost:9308/insert -d '{"index":"test_json_rowwise","id":1,"doc":{"title":"machine learning","content":"neural networks"}}' | jq -r 'if ._id then ._id else "inserted" end'
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_json_rowwise WHERE KNN(embedding, 1, 'machine learning neural networks')"
––– output –––
OK
––– input –––
curl -s -X POST http://localhost:9308/bulk -H "Content-Type: application/x-ndjson" -d '
{"insert":{"index":"test_json_rowwise","id":2,"doc":{"title":"computer vision","content":"image recognition"}}}
{"insert":{"index":"test_json_rowwise","id":3,"doc":{"title":"NLP","content":"text processing"}}}
' | jq '{created: .items[0].bulk.created}'
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_json_rowwise WHERE id IN (2,3)"
––– output –––
OK
––– input –––
curl -s -X POST http://localhost:9308/replace -d '{"index":"test_json_rowwise","id":1,"doc":{"title":"updated ML","content":"updated networks"}}' | jq -r '.result'
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT title FROM test_json_rowwise WHERE id=1 AND KNN(embedding, 1, 'updated ML networks')"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_json_rowwise (id, title, content) VALUES (100, 'test', 'data')";
curl -s -X POST http://localhost:9308/insert -d '{"index":"test_json_rowwise","id":101,"doc":{"title":"test","content":"data"}}' > /dev/null
––– output –––
OK
––– input –––
mysql -h0 -P9306 --batch --skip-column-names -e "SELECT embedding FROM test_json_rowwise WHERE id=100" > /tmp/v1.txt
mysql -h0 -P9306 --batch --skip-column-names -e "SELECT embedding FROM test_json_rowwise WHERE id=101" > /tmp/v2.txt
diff -q /tmp/v1.txt /tmp/v2.txt > /dev/null && echo "Vectors identical" || echo "Vectors differ"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_json_rowwise"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_json_rowwise; OPTIMIZE TABLE test_json_rowwise OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
VECTOR=$(python3 -c "print(','.join(['0.01']*384))")
curl -s -X POST http://localhost:9308/search -d "{\"index\":\"test_json_rowwise\",\"knn\":{\"field\":\"embedding\",\"query_vector\":[$VECTOR],\"k\":2}}" | jq -r '.hits.total // "0"'
––– output –––
- 5
+ 2
test/clt-tests/mcl/auto-embeddings-dml-test.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd $SEARCHD_FLAGS > /dev/null; if timeout 10 grep -qm1 '\[BUDDY\] started' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Buddy started!'; else echo 'Timeout or failed!'; cat /var/log/manticore/searchd.log;fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_delete ( title TEXT, embedding_vector FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2' FROM='title' ) engine='rowwise'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_delete (id, title) VALUES (1, 'One')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DELETE FROM test_delete WHERE id = 1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_delete"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_delete (id, title) VALUES (2,'Two'),(3,'Three'),(4,'Four'),(5,'Five')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DELETE FROM test_delete WHERE id IN (2,3,4)"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_delete"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DROP TABLE IF EXISTS test_replace"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_replace ( title TEXT, price INTEGER, vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2' FROM='title' ) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_replace (id, title, price) VALUES (1, 'Original', 100)"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "REPLACE INTO test_replace (id, title, price) VALUES (1, 'Updated', 200)"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT title, price FROM test_replace WHERE id = 1"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DROP TABLE IF EXISTS test_vector_regen"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_vector_regen ( content TEXT, vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='cosine' MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2' FROM='content')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_vector_regen (id, content) VALUES (1,'AI and ML'),(2,'Deep Learning'),(3,'Cooking')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_vector_regen; OPTIMIZE TABLE test_vector_regen OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_vector_regen WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
+------+
| id   |
+------+
|    1 |
|    2 |
- |    3 |
+ +------+
- +------+
––– input –––
mysql -h0 -P9306 -e "REPLACE INTO test_vector_regen (id, content) VALUES (1, 'Cooking recipes')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_vector_regen; OPTIMIZE TABLE test_vector_regen OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_vector_regen WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
+------+
| id   |
+------+
|    2 |
|    3 |
- |    1 |
+ +------+
- +------+
––– input –––
mysql -h0 -P9306 -e "TRUNCATE TABLE test_vector_regen"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_vector_regen"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DROP TABLE IF EXISTS test_bulk"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_bulk ( content TEXT, vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2' FROM='content' ) engine='rowwise'"; echo $?
––– output –––
OK
––– input –––
for i in {1..50}; do
    mysql -h0 -P9306 -e "INSERT INTO test_bulk (id, content) VALUES ($i, 'Document $i')" 2>/dev/null
done
echo "Inserted 50 records"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DELETE FROM test_bulk WHERE id <= 10"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DELETE FROM test_bulk WHERE id >= 40"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_bulk"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DROP TABLE IF EXISTS test_multi_vec"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_multi_vec ( title TEXT, description TEXT, title_vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2' FROM='title', desc_vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='cosine' MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2' FROM='description' ) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_multi_vec (id, title, description) VALUES (1,'Title1','Desc1'),(2,'Title2','Desc2')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DELETE FROM test_multi_vec WHERE id=1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "REPLACE INTO test_multi_vec (id, title, description) VALUES (2,'NewTitle','NewDesc')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id, title FROM test_multi_vec"
––– output –––
OK
test/clt-tests/mcl/auto-embeddings-voyage-remote.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
cosine_similarity() {
    local file1="$1" file2="$2"

    awk '
    NR==FNR { a[NR]=$1; suma2+=$1*$1; next }
    {
        dot += a[FNR]*$1
        sumb2 += $1*$1
    }
    END {
        print dot / (sqrt(suma2) * sqrt(sumb2))
    }' "$file1" "$file2"
}
––– output –––
OK
––– input –––
export -f cosine_similarity
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_invalid_model (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/invalid-model-name-12345' FROM = 'title') " 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_valid_model_no_api_key (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/voyage-3.5-lite' FROM = 'title') " 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_voyage_remote (title TEXT, content TEXT, description TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/voyage-3.5-lite' FROM = 'title, content' API_KEY='${VOYAGE_API_KEY}') "; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SHOW CREATE TABLE test_voyage_remote"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_voyage_remote (id, title, content, description) VALUES(1, 'machine learning algorithms', 'deep neural networks and artificial intelligence', 'advanced AI research')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as record_count FROM test_voyage_remote WHERE id=1"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_voyage_remote (id, title, content, description) VALUES(2, 'machine learning algorithms', 'deep neural networks and artificial intelligence', 'different description')"

mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_remote WHERE id=1" | \
    grep -v embedding | \
    sed 's/[0-9]\+\(\.[0-9]\+\)\?/\n&\n/g' | \
    grep -E '^[0-9]+(\.[0-9]+)?$' | \
    awk '{printf "%.5f\n", $1}' > /tmp/vector1.txt

mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_remote WHERE id=2" | \
    grep -v embedding | \
    sed 's/[0-9]\+\(\.[0-9]\+\)\?/\n&\n/g' | \
    grep -E '^[0-9]+(\.[0-9]+)?$' | \
    awk '{printf "%.5f\n", $1}' > /tmp/vector2.txt

SIMILARITY=$(cosine_similarity /tmp/vector1.txt /tmp/vector2.txt)

echo "Cosine similarity: $SIMILARITY"

RESULT=$(awk -v sim="$SIMILARITY" 'BEGIN {
    if (sim > 0.99)
        print "SUCCESS: Same FROM fields produce similar vectors (similarity: " sim ")"
    else
        print "FAIL: Different vectors (FROM does not include description field and should not change generated vector value) (similarity: " sim ")"
}')

echo "$RESULT"

rm -f /tmp/vector1.txt /tmp/vector2.txt
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_voyage_title_only (title TEXT, content TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/voyage-3.5-lite' FROM = 'title' API_KEY='${VOYAGE_API_KEY}') "; mysql -h0 -P9306 -e "INSERT INTO test_voyage_title_only (id, title, content) VALUES(1, 'machine learning algorithms', 'completely different content here')"; MD5_MULTI=$(mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_remote WHERE id=1" | grep -v embedding | md5sum | awk '{print $1}'); MD5_SINGLE=$(mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_title_only WHERE id=1" | grep -v embedding | md5sum | awk '{print $1}'); echo "multi_field_md5: $MD5_MULTI"; echo "single_field_md5: $MD5_SINGLE"; if [ "$MD5_MULTI" != "$MD5_SINGLE" ]; then echo "SUCCESS: Different FROM specifications produce different vectors"; else echo "INFO: FROM field comparison result"; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test__invalid_field (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/text-embedding-ada-002' FROM = 'nonexistent_field') " 2>&1
––– output –––
OK
––– input –––
if mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test_voyage_no_from'" | grep -q test_voyage_no_from; then mysql -h0 -P9306 -e "INSERT INTO test__no_from (id, title, embedding) VALUES(1, 'test title', '(0.1, 0.2, 0.3, 0.4, 0.5)')"; echo "insert_result: $?"; else echo "insert_result: skipped (table not created)"; fi
––– output –––
OK
––– input –––
if mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test__no_from'" | grep -q test_voyage_no_from; then mysql -h0 -P9306 -e "SHOW CREATE TABLE test_voyage_no_from"; else echo "table_structure: skipped (table not created)"; fi
––– output –––
OK
––– input –––
if [ -n "$VOYAGE_API_KEY" ] && [ "$VOYAGE_API_KEY" != "dummy_key_for_testing" ]; then echo "API key is available for testing"; else echo "API key not available - using dummy for error testing"; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id, knn_dist() FROM test_voyage_remote WHERE knn(embedding, 3, 'machine learning and artificial intelligence')\G"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as count FROM test_voyage_remote WHERE knn(embedding, 5, 'technology and AI') AND id > 0"
––– output –––
OK
––– input –––
API_KEY_VAL="${VOYAGE_API_KEY}"; cat > /etc/manticoresearch/manticore.conf << CONFEOF
searchd {
    listen = 127.0.0.1:9306:mysql41
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    pid_file = /var/run/manticore/searchd.pid
}

table test_voyage_plain {
    type = rt
    path = /var/lib/manticore/test_voyage_plain
    rt_field = title
    rt_field = content
    rt_attr_float_vector = embedding
    knn = {"attrs":[{"name":"embedding","type":"hnsw","hnsw_similarity":"L2","hnsw_m":16,"hnsw_ef_construction":200,"model_name":"voyage/voyage-3.5-lite","from":"title,content","api_key":"${API_KEY_VAL}"}]}
}
CONFEOF
––– output –––
OK
––– input –––
searchd --stopwait --quiet
––– output –––
+ [Tue Jan 13 16:15:49.490 2026] [93] WARNING: Error initializing secondary index: daemon requires secondary library v18 (trying to load v19)
+ [Tue Jan 13 16:15:49.490 2026] [93] FATAL: malformed or unknown option near '--quiet'; use '-h' or '--help' to see available options.
+ Manticore 0.0.0 0240e6481@25121214 (columnar 0.0.0 e5dff86@26011314) (knn 0.0.0 e5dff86@26011314) (embeddings 1.1.0 unknown@00000000)
+ Copyright (c) 2001-2016, Andrew Aksyonoff
+ Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
+ Copyright (c) 2017-2025, Manticore Software LTD (https://manticoresearch.com)
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SHOW TABLES"
––– output –––
- +-------------------+------+
- | Table             | Type |
- +-------------------+------+
- | test_voyage_plain | rt   |
- +-------------------+------+
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_voyage_plain (id, title, content) VALUES(1, 'bread', 'food item'), (2, 'cat', 'animal pet')"; echo $?
––– output –––
- 0
+ ERROR 1064 (42000) at line 1: Cannot create the table automatically in Plain mode. Make sure the table exists before inserting into it
+ 1
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as count FROM test_voyage_plain"
––– output –––
- +-------+
+ ERROR 1064 (42000) at line 1: unknown local table(s) 'test_voyage_plain' in search request
- | count |
- +-------+
- |     2 |
- +-------+
––– input –––
mysql -h0 -P9306 -E -e "SELECT id, title FROM test_voyage_plain WHERE knn(embedding, 2, 'dog')"
––– output –––
- *************************** 1. row ***************************
+ ERROR 1064 (42000) at line 1: unknown local table(s) 'test_voyage_plain' in search request
-    id: 2
- title: cat
- *************************** 2. row ***************************
-    id: 1
- title: bread
––– input –––
cat > /etc/manticoresearch/manticore.conf << 'EOF'
searchd {
    listen = 127.0.0.1:9306:mysql41
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    pid_file = /var/run/manticore/searchd.pid
}

table test_voyage_no_key {
    type = rt
    path = /var/lib/manticore/test_voyage_no_key
    rt_field = title
    rt_attr_float_vector = embedding
    knn = {"attrs":[{"name":"embedding","type":"hnsw","hnsw_similarity":"L2","model_name":"voyage/voyage-3.5-lite","from":"title"}]}
}
EOF
––– output –––
OK
––– input –––
searchd --stopwait --quiet
––– output –––
+ [Tue Jan 13 16:15:51.022 2026] [131] WARNING: Error initializing secondary index: daemon requires secondary library v18 (trying to load v19)
+ [Tue Jan 13 16:15:51.022 2026] [131] FATAL: malformed or unknown option near '--quiet'; use '-h' or '--help' to see available options.
+ Manticore 0.0.0 0240e6481@25121214 (columnar 0.0.0 e5dff86@26011314) (knn 0.0.0 e5dff86@26011314) (embeddings 1.1.0 unknown@00000000)
+ Copyright (c) 2001-2016, Andrew Aksyonoff
+ Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
+ Copyright (c) 2017-2025, Manticore Software LTD (https://manticoresearch.com)
––– input –––
searchd 2>&1|grep WARNING
––– output –––
- WARNING: table 'test_voyage_no_key': prealloc: Invalid API key for remote model - NOT SERVING
+ [Tue Jan 13 16:15:51.035 2026] [132] WARNING: Error initializing secondary index: daemon requires secondary library v18 (trying to load v19)
test/clt-tests/mcl/auto-embeddings-error-handling.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd $SEARCHD_FLAGS > /dev/null; if timeout 10 grep -qm1 '\[BUDDY\] started' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Buddy started!'; else echo 'Timeout or failed!'; cat /var/log/manticore/searchd.log;fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_dims (
    title TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' KNN_DIMS='384'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='title'
)" 2>&1
# Check if table was actually created
mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test_dims'" | grep -q "test_dims" && echo "Table created" || echo "Table not created"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_auto_dims (
    title TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='title'
)"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_auto_dims (id, title) VALUES (1, 'Test document')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_no_model (
    title TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    FROM='title'
)"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_no_from (
    content_text TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
)" 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_empty_from (
    title TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM=''
)" 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_bad_model (
    title TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='non-existent-model/invalid-name'
    FROM='title'
)" 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_no_prefix (
    content_text TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='all-MiniLM-L6-v2'
    FROM='content_text'
)" 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_bad_from (
    title TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='non_existent_field'
)"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_circular (
    title TEXT,
    vec1 FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='vec1'
)" 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_empty (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_empty (id, content) VALUES (1, '')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_empty (id, content) VALUES (2, NULL)" 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_empty"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_rowwise (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
) engine='rowwise'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_rowwise (id, content) VALUES
    (1, 'machine learning'),
    (2, 'deep learning')"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_rowwise; OPTIMIZE TABLE test_rowwise OPTION sync=1, cutoff=1"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_rowwise WHERE KNN(vec, 1, 'artificial intelligence')"
––– output –––
+------+
| id   |
+------+
|    1 |
- |    2 |
+ +------+
- +------+
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_vec_columnar (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' engine='columnar'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
)"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_vec_columnar (id, content) VALUES
    (1, 'machine learning'),
    (2, 'deep learning')"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_vec_columnar; OPTIMIZE TABLE test_vec_columnar OPTION sync=1, cutoff=1"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_vec_columnar WHERE KNN(vec, 1, 'artificial intelligence')"
––– output –––
+------+
| id   |
+------+
|    1 |
- |    2 |
+ +------+
- +------+
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_full_columnar (
    content TEXT,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_full_columnar (id, content) VALUES
    (1, 'machine learning'),
    (2, 'deep learning')"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_full_columnar; OPTIMIZE TABLE test_full_columnar OPTION sync=1, cutoff=1"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_full_columnar WHERE KNN(vec, 1, 'artificial intelligence')"
––– output –––
+------+
| id   |
+------+
|    1 |
- |    2 |
+ +------+
- +------+
––– input –––
echo "Row-wise (default):"
mysql -h0 -P9306 -e "SHOW CREATE TABLE test_rowwise\G" | grep -E "vec.*float_vector"
––– output –––
OK
––– input –––
echo "Vec columnar only:"
mysql -h0 -P9306 -e "SHOW CREATE TABLE test_vec_columnar\G" | grep -E "vec.*float_vector"
––– output –––
OK
––– input –––
echo "Full columnar:"
mysql -h0 -P9306 -e "SHOW CREATE TABLE test_full_columnar\G" | grep -E "(vec.*float_vector|engine='columnar')"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT * FROM test_auto_dims WHERE KNN(wrong_field, 1, 'test')" 2>&1 | grep -o "wrong_field.*not found"
––– output –––
OK
test/clt-tests/mcl/auto-embeddings-openai-remote.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
cosine_similarity() {
    local file1="$1" file2="$2"

    awk '
    NR==FNR { a[NR]=$1; suma2+=$1*$1; next }
    {
        dot += a[FNR]*$1
        sumb2 += $1*$1
    }
    END {
        print dot / (sqrt(suma2) * sqrt(sumb2))
    }' "$file1" "$file2"
}
––– output –––
OK
––– input –––
export -f cosine_similarity
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_invalid_model (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'openai/invalid-model-name-12345' FROM = 'title') " 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_valid_model_no_api_key (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'openai/text-embedding-ada-002' FROM = 'title') " 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_openai_remote (title TEXT, content TEXT, description TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'openai/text-embedding-ada-002' FROM = 'title, content' API_KEY='${OPENAI_API_KEY}') "; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SHOW CREATE TABLE test_openai_remote"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_openai_remote (id, title, content, description) VALUES(1, 'machine learning algorithms', 'deep neural networks and artificial intelligence', 'advanced AI research')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as record_count FROM test_openai_remote WHERE id=1"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_openai_remote (id, title, content, description) VALUES(2, 'machine learning algorithms', 'deep neural networks and artificial intelligence', 'different description')"

mysql -h0 -P9306 -e "SELECT embedding FROM test_openai_remote WHERE id=1" | \
    grep -v embedding | \
    sed 's/[0-9]\+\(\.[0-9]\+\)\?/\n&\n/g' | \
    grep -E '^[0-9]+(\.[0-9]+)?$' | \
    awk '{printf "%.5f\n", $1}' > /tmp/vector1.txt

mysql -h0 -P9306 -e "SELECT embedding FROM test_openai_remote WHERE id=2" | \
    grep -v embedding | \
    sed 's/[0-9]\+\(\.[0-9]\+\)\?/\n&\n/g' | \
    grep -E '^[0-9]+(\.[0-9]+)?$' | \
    awk '{printf "%.5f\n", $1}' > /tmp/vector2.txt

SIMILARITY=$(cosine_similarity /tmp/vector1.txt /tmp/vector2.txt)

echo "Cosine similarity: $SIMILARITY"

RESULT=$(awk -v sim="$SIMILARITY" 'BEGIN {
    if (sim > 0.99)
        print "SUCCESS: Same FROM fields produce similar vectors (similarity: " sim ")"
    else
        print "FAIL: Different vectors (FROM does not include description field and should not change generated vector value) (similarity: " sim ")"
}')

echo "$RESULT"

rm -f /tmp/vector1.txt /tmp/vector2.txt
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_openai_title_only (title TEXT, content TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'openai/text-embedding-ada-002' FROM = 'title' API_KEY='${OPENAI_API_KEY}') "; mysql -h0 -P9306 -e "INSERT INTO test_openai_title_only (id, title, content) VALUES(1, 'machine learning algorithms', 'completely different content here')"; MD5_MULTI=$(mysql -h0 -P9306 -e "SELECT embedding FROM test_openai_remote WHERE id=1" | grep -v embedding | md5sum | awk '{print $1}'); MD5_SINGLE=$(mysql -h0 -P9306 -e "SELECT embedding FROM test_openai_title_only WHERE id=1" | grep -v embedding | md5sum | awk '{print $1}'); echo "multi_field_md5: $MD5_MULTI"; echo "single_field_md5: $MD5_SINGLE"; if [ "$MD5_MULTI" != "$MD5_SINGLE" ]; then echo "SUCCESS: Different FROM specifications produce different vectors"; else echo "INFO: FROM field comparison result"; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_openai_invalid_field (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'openai/text-embedding-ada-002' FROM = 'nonexistent_field') " 2>&1
––– output –––
OK
––– input –––
if mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test_openai_no_from'" | grep -q test_openai_no_from; then mysql -h0 -P9306 -e "INSERT INTO test_openai_no_from (id, title, embedding) VALUES(1, 'test title', '(0.1, 0.2, 0.3, 0.4, 0.5)')"; echo "insert_result: $?"; else echo "insert_result: skipped (table not created)"; fi
––– output –––
OK
––– input –––
if mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test_openai_no_from'" | grep -q test_openai_no_from; then mysql -h0 -P9306 -e "SHOW CREATE TABLE test_openai_no_from"; else echo "table_structure: skipped (table not created)"; fi
––– output –––
OK
––– input –––
if [ -n "$OPENAI_API_KEY" ] && [ "$OPENAI_API_KEY" != "dummy_key_for_testing" ]; then echo "API key is available for testing"; else echo "API key not available - using dummy for error testing"; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id, knn_dist() FROM test_openai_remote WHERE knn(embedding, 3, 'machine learning and artificial intelligence')\G"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as count FROM test_openai_remote WHERE knn(embedding, 5, 'technology and AI') AND id > 0"
––– output –––
OK
––– input –––
API_KEY_VAL="${OPENAI_API_KEY}"; cat > /etc/manticoresearch/manticore.conf << CONFEOF
searchd {
    listen = 127.0.0.1:9306:mysql41
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    pid_file = /var/run/manticore/searchd.pid
}

table test_openai_plain {
    type = rt
    path = /var/lib/manticore/test_openai_plain
    rt_field = title
    rt_field = content
    rt_attr_float_vector = embedding
    knn = {"attrs":[{"name":"embedding","type":"hnsw","hnsw_similarity":"L2","hnsw_m":16,"hnsw_ef_construction":200,"model_name":"openai/text-embedding-ada-002","from":"title,content","api_key":"${API_KEY_VAL}"}]}
}
CONFEOF
––– output –––
OK
––– input –––
searchd --stopwait --quiet
––– output –––
+ [Tue Jan 13 16:15:04.795 2026] [93] WARNING: Error initializing secondary index: daemon requires secondary library v18 (trying to load v19)
+ [Tue Jan 13 16:15:04.795 2026] [93] FATAL: malformed or unknown option near '--quiet'; use '-h' or '--help' to see available options.
+ Manticore 0.0.0 0240e6481@25121214 (columnar 0.0.0 e5dff86@26011314) (knn 0.0.0 e5dff86@26011314) (embeddings 1.1.0 unknown@00000000)
+ Copyright (c) 2001-2016, Andrew Aksyonoff
+ Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
+ Copyright (c) 2017-2025, Manticore Software LTD (https://manticoresearch.com)
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SHOW TABLES"
––– output –––
- +-------------------+------+
- | Table             | Type |
- +-------------------+------+
- | test_openai_plain | rt   |
- +-------------------+------+
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_openai_plain (id, title, content) VALUES(1, 'bread', 'food item'), (2, 'cat', 'animal pet')"; echo $?
––– output –––
- 0
+ ERROR 1064 (42000) at line 1: Cannot create the table automatically in Plain mode. Make sure the table exists before inserting into it
+ 1
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as count FROM test_openai_plain"
––– output –––
- +-------+
+ ERROR 1064 (42000) at line 1: unknown local table(s) 'test_openai_plain' in search request
- | count |
- +-------+
- |     2 |
- +-------+
––– input –––
mysql -h0 -P9306 -E -e "SELECT id, title FROM test_openai_plain WHERE knn(embedding, 2, 'dog')"
––– output –––
- *************************** 1. row ***************************
+ ERROR 1064 (42000) at line 1: unknown local table(s) 'test_openai_plain' in search request
-    id: 2
- title: cat
- *************************** 2. row ***************************
-    id: 1
- title: bread
––– input –––
cat > /etc/manticoresearch/manticore.conf << 'EOF'
searchd {
    listen = 127.0.0.1:9306:mysql41
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    pid_file = /var/run/manticore/searchd.pid
}

table test_openai_no_key {
    type = rt
    path = /var/lib/manticore/test_openai_no_key
    rt_field = title
    rt_attr_float_vector = embedding
    knn = {"attrs":[{"name":"embedding","type":"hnsw","hnsw_similarity":"L2","model_name":"openai/text-embedding-ada-002","from":"title"}]}
}
EOF
––– output –––
OK
––– input –––
searchd --stopwait --quiet
––– output –––
+ [Tue Jan 13 16:15:05.235 2026] [131] WARNING: Error initializing secondary index: daemon requires secondary library v18 (trying to load v19)
+ [Tue Jan 13 16:15:05.235 2026] [131] FATAL: malformed or unknown option near '--quiet'; use '-h' or '--help' to see available options.
+ Manticore 0.0.0 0240e6481@25121214 (columnar 0.0.0 e5dff86@26011314) (knn 0.0.0 e5dff86@26011314) (embeddings 1.1.0 unknown@00000000)
+ Copyright (c) 2001-2016, Andrew Aksyonoff
+ Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
+ Copyright (c) 2017-2025, Manticore Software LTD (https://manticoresearch.com)
––– input –––
searchd 2>&1|grep WARNING
––– output –––
- WARNING: table 'test_openai_no_key': prealloc: Invalid API key for remote model - NOT SERVING
+ [Tue Jan 13 16:15:05.247 2026] [132] WARNING: Error initializing secondary index: daemon requires secondary library v18 (trying to load v19)

@sanikolaev
Copy link
Collaborator Author

@glookka pls review the C++ changes in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants