From 6bc545b78c4d625e741de29cd18eae90b3a63854 Mon Sep 17 00:00:00 2001 From: fzowl Date: Mon, 22 Dec 2025 15:20:49 +0100 Subject: [PATCH 01/10] Adding VoyageAI embeddings documentation --- en/rag/embedding.html | 156 ++++++++++++++++++++++++++ en/reference/rag/embedding.html | 193 ++++++++++++++++++++++++++++++++ 2 files changed, 349 insertions(+) diff --git a/en/rag/embedding.html b/en/rag/embedding.html index ab52624497..e450cd34c6 100644 --- a/en/rag/embedding.html +++ b/en/rag/embedding.html @@ -496,6 +496,162 @@

SPLADE ranking

+

VoyageAI Embedder

+ +

An embedder that uses the VoyageAI embedding API +to generate high-quality embeddings for semantic search. This embedder calls the VoyageAI API service +and does not require local model files or ONNX inference.

+ +
{% highlight xml %}
+
+    
+        voyage-3.5
+        voyage_api_key
+    
+
+{% endhighlight %}
+ + + +

Add your VoyageAI API key to the secret store:

+
+vespa secret add voyage_api_key --value "pa-xxxxx..."
+
+ +

See the reference +for all configuration parameters including caching, retry logic, and performance tuning.

+ +

VoyageAI embedder models

+

+ VoyageAI offers several embedding models optimized for different use cases. + The resulting tensor type can be float or + bfloat16 for storage efficiency. +

+ +

Latest general-purpose models (recommended):

+ + +

Previous generation general-purpose models:

+ + +

Specialized models:

+ + +

Contextual model:

+ + +

Multimodal model (preview):

+ + +

Input type detection

+

VoyageAI models distinguish between query and document embeddings for improved retrieval quality. +The embedder automatically detects the context and sets the appropriate input type:

+ + +

You can disable auto-detection and set a fixed input type:

+
{% highlight xml %}
+
+    voyage-3.5
+    voyage_api_key
+    false
+    query
+
+{% endhighlight %}
+ +

VoyageAI performance features

+

The VoyageAI embedder includes several performance optimizations:

+ + +

Example with performance tuning:

+
{% highlight xml %}
+
+    voyage-3.5
+    voyage_api_key
+    20
+    true
+
+{% endhighlight %}
+ +

Usage example

+

Complete example showing document indexing and query-time embedding:

+ +

Schema definition:

+
+schema doc {
+    document doc {
+        field text type string {
+            indexing: summary | index
+        }
+    }
+
+    field embedding type tensor<float>(x[1024]) {
+        indexing: input text | embed voyage | attribute | index
+        attribute {
+            distance-metric: angular
+        }
+    }
+
+    rank-profile semantic {
+        inputs {
+            query(q) tensor<float>(x[1024])
+        }
+        first-phase {
+            expression: closeness(field, embedding)
+        }
+    }
+}
+
+ +

Query with embedding:

+
{% highlight bash %}
+vespa query \
+  'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding,q)' \
+  'input.query(q)=embed(voyage, "machine learning tutorials")'
+{% endhighlight %}
+ +

When using normalize set to true, use +distance-metric: prenormalized-angular +for more efficient similarity computation.

+ +

Embedder performance

Embedding inference can be resource-intensive for larger embedding models. Factors that impact performance:

diff --git a/en/reference/rag/embedding.html b/en/reference/rag/embedding.html index 0bf9d40a98..3dcfdba557 100644 --- a/en/reference/rag/embedding.html +++ b/en/reference/rag/embedding.html @@ -478,6 +478,199 @@

splade embedder reference config

+

VoyageAI Embedder

+

+ An embedder that uses the VoyageAI API + to generate embeddings. This is an API-based embedder that does not require local model files or ONNX inference. + It calls the VoyageAI service to generate high-quality embeddings optimized for semantic search. +

+

+ The VoyageAI embedder is configured in services.xml, + within the container tag: +

+
{% highlight xml %}
+
+    
+        voyage-3.5
+        voyage_api_key
+    
+
+{% endhighlight %}
+ +

Secret Management

+

+ The VoyageAI API key must be stored in Vespa's + secret store for secure management: +

+
+vespa secret add voyage_api_key --value "pa-xxxxx..."
+
+

+ The api-key-secret-ref parameter references the secret name. + Secrets are automatically refreshed when rotated without requiring application restart. +

+ +

VoyageAI embedder reference config

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameOccurrenceDescriptionTypeDefault
api-key-secret-refOneRequired. Reference to the secret in Vespa's secret store containing the VoyageAI API key.stringN/A
modelOneThe VoyageAI model to use. Available models: +
    +
  • voyage-3.5 (1024 dims) - Latest and best quality, state-of-the-art (recommended)
  • +
  • voyage-3.5-lite (512 dims) - Newest lite model, excellent quality at lower cost
  • +
  • voyage-3 (1024 dims) - Previous generation, high quality
  • +
  • voyage-3-lite (512 dims) - Previous generation, cost-efficient
  • +
  • voyage-code-3 (1024 dims) - Code search optimization
  • +
  • voyage-finance-2 (1024 dims) - Financial documents
  • +
  • voyage-law-2 (1024 dims) - Legal documents
  • +
  • voyage-multilingual-2 (1024 dims) - Multilingual support
  • +
  • voyage-context-3 (1024 dims, configurable: 256/512/1024/2048) - Contextualized document chunk embeddings
  • +
  • voyage-multimodal-3.5 (1024 dims, configurable: 256/512/1024/2048) - Multimodal embeddings (text, images, video) [preview]
  • +
+
stringvoyage-3.5
endpointOneVoyageAI API endpoint URL. Can be overridden for custom proxies or regional endpoints.stringhttps://api.voyageai.com/v1/embeddings
timeoutOneRequest timeout in milliseconds. Also serves as the bound for retry attempts - retries stop when total elapsed time would exceed this timeout. Minimum value: 1000ms.numeric30000
max-retriesOneMaximum number of retry attempts for failed requests. Used as a safety limit in addition to the timeout-based retry bound.numeric10
default-input-typeOneDefault input type when auto-detection is disabled. Valid values: query or document. VoyageAI models use different optimizations for queries vs documents.enumdocument
auto-detect-input-typeOneWhether to automatically detect input type based on context. When enabled, uses query type for query-time embeddings and document type for indexing.booleantrue
normalizeOneWhether to apply L2 normalization to embeddings. When enabled, all embedding vectors are normalized to unit length. Use with prenormalized-angular distance-metric for efficient similarity computation.booleanfalse
truncateOneWhether to truncate input text exceeding model limits. When enabled, text is automatically truncated. When disabled, requests with too-long text will fail.booleantrue
max-idle-connectionsOneMaximum number of idle HTTP connections to keep in the connection pool. Higher values improve throughput for concurrent requests but use more resources.numeric5
+ +

Example Configurations

+ +

Basic configuration (recommended):

+
{% highlight xml %}
+
+    voyage-3.5
+    voyage_api_key
+
+{% endhighlight %}
+ +

High-performance configuration:

+
{% highlight xml %}
+
+    voyage-3.5
+    voyage_api_key
+    20
+    60000
+
+{% endhighlight %}
+ +

Fast and cost-efficient configuration:

+
{% highlight xml %}
+
+    voyage-3.5-lite
+    voyage_api_key
+
+{% endhighlight %}
+ +

Query-optimized configuration:

+
{% highlight xml %}
+
+    voyage-3.5
+    voyage_api_key
+    query
+    false
+    true
+
+{% endhighlight %}
+ +

Code search configuration:

+
{% highlight xml %}
+
+    voyage-code-3
+    voyage_api_key
+    true
+
+{% endhighlight %}
+ +

Cost and Performance Optimization

+

The VoyageAI embedder includes several features to reduce API costs and improve performance:

+ + +

For detailed performance monitoring, the embedder emits standard Vespa embedder metrics + (see Container Metrics). + Monitor API usage and costs through the VoyageAI dashboard.

+ + +

Huggingface tokenizer embedder

The Huggingface tokenizer embedder is configured in services.xml, From a270864d7c83704c60e4be54299267b11cab58fc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= Date: Thu, 22 Jan 2026 08:30:14 +0100 Subject: [PATCH 02/10] chore: use incremental build for containerized dev environment --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 0776a1a5fb..1580de5d10 100644 --- a/README.md +++ b/README.md @@ -48,12 +48,12 @@ Mac $ docker run -ti --rm --name doc \ --publish 4000:4000 -e JEKYLL_UID=$UID -v $(pwd):/srv/jekyll \ - jekyll/jekyll jekyll serve + jekyll/jekyll jekyll serve --incremental --force_polling or RHEL 8 $ podman run -it --rm --name doc -p 4000:4000 -e JEKYLL_ROOTLESS=true \ - -v "$PWD":/srv/jekyll:Z docker.io/jekyll/jekyll jekyll serve + -v "$PWD":/srv/jekyll:Z docker.io/jekyll/jekyll jekyll serve --incremental --force_polling The Jekyll server should normally rebuild HTML files automatically when a source files changes. If this does not happen, you can use From 7887e7853ef664aa97e07d7c65aa33b67dd6a7d8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= Date: Thu, 22 Jan 2026 08:47:22 +0100 Subject: [PATCH 03/10] chore: update VoyageAI embedder documentation - Remove unsupported configuration options (normalize, auto-detect-input-type, default-input-type, timeout, max-retries, max-idle-connections). - Document automatic vector normalization and input type detection, clarify programmatic control via Embedder.Context destination property. --- en/rag/embedding.html | 104 ++++------------------- en/reference/rag/embedding.html | 143 ++------------------------------ 2 files changed, 25 insertions(+), 222 deletions(-) diff --git a/en/rag/embedding.html b/en/rag/embedding.html index e450cd34c6..70a344b4a8 100644 --- a/en/rag/embedding.html +++ b/en/rag/embedding.html @@ -500,12 +500,15 @@

VoyageAI Embedder

An embedder that uses the VoyageAI embedding API to generate high-quality embeddings for semantic search. This embedder calls the VoyageAI API service -and does not require local model files or ONNX inference.

+and does not require local model files or ONNX inference. All embeddings returned by VoyageAI are normalized +to unit length, making them suitable for cosine similarity and +prenormalized-angular distance metrics +(see VoyageAI FAQ).

{% highlight xml %}
 
     
-        voyage-3.5
+        voyage-4
         voyage_api_key
     
 
@@ -514,10 +517,6 @@ 

VoyageAI Embedder

-

Add your VoyageAI API key to the secret store:

-
-vespa secret add voyage_api_key --value "pa-xxxxx..."
-
- -

See the reference -for all configuration parameters including caching, retry logic, and performance tuning.

+

See the reference +for all configuration parameters.

VoyageAI embedder models

-

- VoyageAI offers several embedding models optimized for different use cases. - The resulting tensor type can be float or - bfloat16 for storage efficiency. -

- -

Latest general-purpose models (recommended):

+

For the complete list of available models and their specifications, see:

- -

Previous generation general-purpose models:

- - -

Specialized models:

- - -

Contextual model:

- - -

Multimodal model (preview):

-

Input type detection

VoyageAI models distinguish between query and document embeddings for improved retrieval quality. -The embedder automatically detects the context and sets the appropriate input type:

- - -

You can disable auto-detection and set a fixed input type:

-
{% highlight xml %}
-
-    voyage-3.5
-    voyage_api_key
-    false
-    query
-
-{% endhighlight %}
+The embedder automatically detects the context and sets the appropriate input type based on whether +the embedding is performed during feed (indexing) or query processing in Vespa.

-

VoyageAI performance features

-

The VoyageAI embedder includes several performance optimizations:

- - -

Example with performance tuning:

-
{% highlight xml %}
-
-    voyage-3.5
-    voyage_api_key
-    20
-    true
-
-{% endhighlight %}
+

For advanced use cases where you need to control the input type programmatically, +you can use the destination property of the +Embedder.Context +when calling the embedder from Java code.

Usage example

Complete example showing document indexing and query-time embedding:

@@ -625,7 +561,7 @@

Usage example

field embedding type tensor<float>(x[1024]) { indexing: input text | embed voyage | attribute | index attribute { - distance-metric: angular + distance-metric: prenormalized-angular } } @@ -647,10 +583,6 @@

Usage example

'input.query(q)=embed(voyage, "machine learning tutorials")' {% endhighlight %}
-

When using normalize set to true, use -distance-metric: prenormalized-angular -for more efficient similarity computation.

-

Embedder performance

diff --git a/en/reference/rag/embedding.html b/en/reference/rag/embedding.html index 3dcfdba557..efef9e6914 100644 --- a/en/reference/rag/embedding.html +++ b/en/reference/rag/embedding.html @@ -481,8 +481,7 @@

splade embedder reference config

VoyageAI Embedder

An embedder that uses the VoyageAI API - to generate embeddings. This is an API-based embedder that does not require local model files or ONNX inference. - It calls the VoyageAI service to generate high-quality embeddings optimized for semantic search. + to generate embeddings.

The VoyageAI embedder is configured in services.xml, @@ -491,25 +490,14 @@

VoyageAI Embedder

{% highlight xml %}
 
     
-        voyage-3.5
+        voyage-law-2
         voyage_api_key
+        https://api.voyageai.com/v1/embeddings
+        true
     
 
 {% endhighlight %}
-

Secret Management

-

- The VoyageAI API key must be stored in Vespa's - secret store for secure management: -

-
-vespa secret add voyage_api_key --value "pa-xxxxx..."
-
-

- The api-key-secret-ref parameter references the secret name. - Secrets are automatically refreshed when rotated without requiring application restart. -

-

VoyageAI embedder reference config

@@ -525,29 +513,17 @@

VoyageAI embedder reference config

- + - - + @@ -556,41 +532,6 @@

VoyageAI embedder reference configstring

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - @@ -598,79 +539,9 @@

VoyageAI embedder reference configboolean

- - - - - - -
api-key-secret-ref OneRequired. Reference to the secret in Vespa's secret store containing the VoyageAI API key.Required. Reference to the secret in Vespa's secret store containing the VoyageAI API key. string N/A
model OneThe VoyageAI model to use. Available models: -
    -
  • voyage-3.5 (1024 dims) - Latest and best quality, state-of-the-art (recommended)
  • -
  • voyage-3.5-lite (512 dims) - Newest lite model, excellent quality at lower cost
  • -
  • voyage-3 (1024 dims) - Previous generation, high quality
  • -
  • voyage-3-lite (512 dims) - Previous generation, cost-efficient
  • -
  • voyage-code-3 (1024 dims) - Code search optimization
  • -
  • voyage-finance-2 (1024 dims) - Financial documents
  • -
  • voyage-law-2 (1024 dims) - Legal documents
  • -
  • voyage-multilingual-2 (1024 dims) - Multilingual support
  • -
  • voyage-context-3 (1024 dims, configurable: 256/512/1024/2048) - Contextualized document chunk embeddings
  • -
  • voyage-multimodal-3.5 (1024 dims, configurable: 256/512/1024/2048) - Multimodal embeddings (text, images, video) [preview]
  • -
+
Required. The VoyageAI model to use. See the VoyageAI embeddings documentation for the complete list of available models including general-purpose, specialized, contextualized, and multimodal models. stringvoyage-3.5N/A
endpointhttps://api.voyageai.com/v1/embeddings
timeoutOneRequest timeout in milliseconds. Also serves as the bound for retry attempts - retries stop when total elapsed time would exceed this timeout. Minimum value: 1000ms.numeric30000
max-retriesOneMaximum number of retry attempts for failed requests. Used as a safety limit in addition to the timeout-based retry bound.numeric10
default-input-typeOneDefault input type when auto-detection is disabled. Valid values: query or document. VoyageAI models use different optimizations for queries vs documents.enumdocument
auto-detect-input-typeOneWhether to automatically detect input type based on context. When enabled, uses query type for query-time embeddings and document type for indexing.booleantrue
normalizeOneWhether to apply L2 normalization to embeddings. When enabled, all embedding vectors are normalized to unit length. Use with prenormalized-angular distance-metric for efficient similarity computation.booleanfalse
truncate Onetrue
max-idle-connectionsOneMaximum number of idle HTTP connections to keep in the connection pool. Higher values improve throughput for concurrent requests but use more resources.numeric5
-

Example Configurations

- -

Basic configuration (recommended):

-
{% highlight xml %}
-
-    voyage-3.5
-    voyage_api_key
-
-{% endhighlight %}
- -

High-performance configuration:

-
{% highlight xml %}
-
-    voyage-3.5
-    voyage_api_key
-    20
-    60000
-
-{% endhighlight %}
- -

Fast and cost-efficient configuration:

-
{% highlight xml %}
-
-    voyage-3.5-lite
-    voyage_api_key
-
-{% endhighlight %}
- -

Query-optimized configuration:

-
{% highlight xml %}
-
-    voyage-3.5
-    voyage_api_key
-    query
-    false
-    true
-
-{% endhighlight %}
- -

Code search configuration:

-
{% highlight xml %}
-
-    voyage-code-3
-    voyage_api_key
-    true
-
-{% endhighlight %}
- -

Cost and Performance Optimization

-

The VoyageAI embedder includes several features to reduce API costs and improve performance:

- - -

For detailed performance monitoring, the embedder emits standard Vespa embedder metrics - (see Container Metrics). - Monitor API usage and costs through the VoyageAI dashboard.

- - -

Huggingface tokenizer embedder

The Huggingface tokenizer embedder is configured in services.xml, From 32cc54aff55e25628a1839f1f09199f043e9b24b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= Date: Thu, 22 Jan 2026 10:42:55 +0100 Subject: [PATCH 04/10] chore(voyage-ai-embedder): add section on best practices --- en/rag/embedding.html | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/en/rag/embedding.html b/en/rag/embedding.html index 70a344b4a8..732ec30b0b 100644 --- a/en/rag/embedding.html +++ b/en/rag/embedding.html @@ -583,6 +583,41 @@

Usage example

'input.query(q)=embed(voyage, "machine learning tutorials")' {% endhighlight %} +

Best practices

+

For production deployments, we recommend configuring separate embedder components for feed and search operations. +This architectural pattern provides two key benefits - cost optimization and rate limit isolation. +In Vespa Cloud, it's best practice to configure these embedders in separate container clusters for feed and search.

+ +
{% highlight xml %}
+
+    
+        voyage-4-large
+        voyage_feed_api_key
+    
+    
+
+
+
+    
+        voyage-4-lite
+        voyage_search_api_key
+    
+    
+
+{% endhighlight %}
+ +
Cost optimization with model variants
+

The Voyage 4 model family features a shared embedding space +across different model sizes. This enables a cost-effective strategy where you can use a more powerful (and expensive) model +for document embeddings, while using a smaller, cheaper model for query embeddings. +Since document embedding happens once during indexing but query embedding occurs on every search request, +this approach can significantly reduce operational costs while maintaining quality.

+ +
Rate limit isolation
+

Separating feed and search operations is particularly important for managing VoyageAI API rate limits. +Bursty document feeding operations can consume significant API quota, potentially causing rate limit errors +that affect search queries. By using separate API keys for feed and search embedders, +you ensure that feeding bursts don't negatively impact search.

Embedder performance

From 2ca355dfcc097117d95dc3d24aba0bb4191fcf8d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= Date: Thu, 22 Jan 2026 11:10:36 +0100 Subject: [PATCH 05/10] chore: simplify - details already covered in the article --- en/rag/embedding.html | 37 ------------------------------------- 1 file changed, 37 deletions(-) diff --git a/en/rag/embedding.html b/en/rag/embedding.html index 732ec30b0b..49ca76d6d1 100644 --- a/en/rag/embedding.html +++ b/en/rag/embedding.html @@ -546,43 +546,6 @@

Input type detection

Embedder.Context when calling the embedder from Java code.

-

Usage example

-

Complete example showing document indexing and query-time embedding:

- -

Schema definition:

-
-schema doc {
-    document doc {
-        field text type string {
-            indexing: summary | index
-        }
-    }
-
-    field embedding type tensor<float>(x[1024]) {
-        indexing: input text | embed voyage | attribute | index
-        attribute {
-            distance-metric: prenormalized-angular
-        }
-    }
-
-    rank-profile semantic {
-        inputs {
-            query(q) tensor<float>(x[1024])
-        }
-        first-phase {
-            expression: closeness(field, embedding)
-        }
-    }
-}
-
- -

Query with embedding:

-
{% highlight bash %}
-vespa query \
-  'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding,q)' \
-  'input.query(q)=embed(voyage, "machine learning tutorials")'
-{% endhighlight %}
-

Best practices

For production deployments, we recommend configuring separate embedder components for feed and search operations. This architectural pattern provides two key benefits - cost optimization and rate limit isolation. From 0328f897d858d5cdc759113b2104aaddf8434ede Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= Date: Mon, 26 Jan 2026 16:10:11 +0100 Subject: [PATCH 06/10] chore: document 'quantization' and 'dimensions' --- en/rag/embedding.html | 3 +++ en/reference/rag/embedding.html | 25 ++++++++++++++++++++----- 2 files changed, 23 insertions(+), 5 deletions(-) diff --git a/en/rag/embedding.html b/en/rag/embedding.html index 49ca76d6d1..4b3bb0ff4a 100644 --- a/en/rag/embedding.html +++ b/en/rag/embedding.html @@ -510,6 +510,7 @@

VoyageAI Embedder

voyage-4 voyage_api_key + 1024 {% endhighlight %} @@ -555,6 +556,7 @@

Best practices

voyage-4-large + 1024 voyage_feed_api_key @@ -563,6 +565,7 @@

Best practices

voyage-4-lite + 1024 voyage_search_api_key diff --git a/en/reference/rag/embedding.html b/en/reference/rag/embedding.html index efef9e6914..67aebcaefa 100644 --- a/en/reference/rag/embedding.html +++ b/en/reference/rag/embedding.html @@ -492,6 +492,7 @@

VoyageAI Embedder

voyage-law-2 voyage_api_key + 1024 https://api.voyageai.com/v1/embeddings true @@ -511,17 +512,24 @@

VoyageAI embedder reference config - api-key-secret-ref + model One - Required. Reference to the secret in Vespa's secret store containing the VoyageAI API key. + Required. The VoyageAI model to use. See the VoyageAI embeddings documentation for the complete list of available models including general-purpose, specialized, contextualized, and multimodal models. + string N/A - model + dimensions One - Required. The VoyageAI model to use. See the VoyageAI embeddings documentation for the complete list of available models including general-purpose, specialized, contextualized, and multimodal models. - + Required. The number of dimensions for the output embedding vectors. Must match the tensor field definition in your schema. Valid values are 256, 512, 1024, 1536, or 2048. See the VoyageAI embeddings documentation for model-specific dimension support. + integer + N/A + + + api-key-secret-ref + One + Required. Reference to the secret in Vespa's secret store containing the VoyageAI API key. string N/A @@ -539,6 +547,13 @@

VoyageAI embedder reference configboolean true + + quantization + Optional + Output quantization format for embedding vectors. Valid values are auto, float, int8, or binary. When set to auto, the embedder infers the appropriate quantization from the dimensions and cell type of the destination tensor in your schema. See the VoyageAI quantization documentation for details on quantization options and their trade-offs. + string + auto + From 9a7c8ceec5e6513335c803744c111432f7d496a8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= Date: Tue, 27 Jan 2026 08:42:14 +0100 Subject: [PATCH 07/10] chore: document use case for contextualized chunk embedding models --- en/rag/embedding.html | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/en/rag/embedding.html b/en/rag/embedding.html index 4b3bb0ff4a..2d0eab40be 100644 --- a/en/rag/embedding.html +++ b/en/rag/embedding.html @@ -537,6 +537,34 @@

VoyageAI embedder models

  • Multimodal Embeddings - Multimodal models for text, images, and video
  • +

    Contextualized chunk embeddings

    +

    To use contextualized chunk embeddings, +configure the VoyageAI embedder with a voyage-context-* model and use it to embed an +array<string> field containing your document chunks:

    + +
    +schema doc {
    +    document doc {
    +        field chunks type array<string> {
    +            indexing: index | summary
    +        }
    +    }
    +    field embeddings type tensor<float>(chunk{}, x[1024]) {
    +        indexing: input chunks | embed voyage | attribute | index
    +        attribute {
    +            distance-metric: prenormalized-angular
    +        }
    +    }
    +}
    +
    + +

    + When embedding array fields with a contextualized chunk embedding model, Vespa sends all chunks from a document in a single API request, + allowing Voyage to encode each chunk with context from the other chunks. + Be aware that the combined size of all chunks in a document must fit within the VoyageAI API's input token limit. + See Working with chunks for chunking strategies. +

    +

    Input type detection

    VoyageAI models distinguish between query and document embeddings for improved retrieval quality. The embedder automatically detects the context and sets the appropriate input type based on whether From 80293f6db216ab4fccdc0c86f4974c69cdb4caea Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= Date: Tue, 27 Jan 2026 09:02:52 +0100 Subject: [PATCH 08/10] chore: update recommendation for self-managed secrets for OpenAI integration --- en/rag/external-llms.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/en/rag/external-llms.md b/en/rag/external-llms.md index 29099a57cb..9a52de67c6 100644 --- a/en/rag/external-llms.md +++ b/en/rag/external-llms.md @@ -59,11 +59,15 @@ This sets up a client component that can be used in a Vespa provides several options to configure the API key used by the client. -1. Using the [Vespa Cloud secret store](../security/secret-store) to store the API key. -2. This is done by setting the `apiKeySecretRef` configuration parameter to the name of the secret -3. in the secret store. This is the recommended way for Vespa Cloud users. -2. Providing the API key in the `X-LLM-API-KEY` HTTP header of the Vespa query. -3. It is also possible to configure the API key in a custom component. For example, [this](https://github.com/vespa-engine/system-test/tree/master/tests/docproc/generate_field_openai) system-test shows how to retrieve the API key from a local file deployed with your Vespa application. Please note that this is NOT recommended for production use, as it is less secure than using the secret store, but it can be modified to suit your needs. +1. Using the [Vespa Cloud secret store](../security/secret-store) to store the API key. + This is done by setting the `apiKeySecretRef` configuration parameter to the name of the secret + in the secret store. This is the recommended way for Vespa Cloud users. +2. For self-managed Vespa, you can provide secrets via environment variables. + Set the `apiKeySecretRef` configuration parameter and expose the secret as an environment variable + named `VESPA_SECRET_`, where `` is the secret reference name converted to + upper snake case. For example, if `apiKeySecretRef` is set to `myApiKey`, the environment variable + should be named `VESPA_SECRET_MY_API_KEY`. +3. Providing the API key in the `X-LLM-API-KEY` HTTP header of the Vespa query. You can set up multiple connections with different settings. For instance, you might want to run different LLMs for different tasks. To distinguish between the From 475f41cb46c78778b104bdd780f43ecca256875f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= Date: Tue, 27 Jan 2026 10:37:08 +0100 Subject: [PATCH 09/10] chore: improve description of quantization parameter --- en/reference/rag/embedding.html | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/en/reference/rag/embedding.html b/en/reference/rag/embedding.html index 67aebcaefa..7d46ab4664 100644 --- a/en/reference/rag/embedding.html +++ b/en/reference/rag/embedding.html @@ -535,14 +535,14 @@

    VoyageAI embedder reference config endpoint - One - VoyageAI API endpoint URL. Can be overridden for custom proxies or regional endpoints. + Optional + VoyageAI API endpoint URL. string https://api.voyageai.com/v1/embeddings truncate - One + Optional Whether to truncate input text exceeding model limits. When enabled, text is automatically truncated. When disabled, requests with too-long text will fail. boolean true @@ -550,7 +550,13 @@

    VoyageAI embedder reference config quantization Optional - Output quantization format for embedding vectors. Valid values are auto, float, int8, or binary. When set to auto, the embedder infers the appropriate quantization from the dimensions and cell type of the destination tensor in your schema. See the VoyageAI quantization documentation for details on quantization options and their trade-offs. + Output quantization format for embedding vectors. Valid values are auto, float, + int8, or binary. When set to auto, + the embedder infers the appropriate quantization from the dimensions and cell type of the destination tensor in your schema. + When using binary quantization, the destination tensor field must use int8 cell type + with 1/8 of the dimensions specified in the embedder configuration (e.g., 1024 dimensions → tensor<int8>(x[128])). + See the VoyageAI quantization documentation + for details on quantization options and binarizing vectors for more on binary quantization in Vespa. string auto From 324162eb837a075d31bbac310f12dcdf755448f2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= Date: Tue, 27 Jan 2026 11:02:03 +0100 Subject: [PATCH 10/10] chore: add badge on minimum version required --- en/rag/embedding.html | 1 + 1 file changed, 1 insertion(+) diff --git a/en/rag/embedding.html b/en/rag/embedding.html index 2d0eab40be..f3a4cef3d3 100644 --- a/en/rag/embedding.html +++ b/en/rag/embedding.html @@ -538,6 +538,7 @@

    VoyageAI embedder models

    Contextualized chunk embeddings

    +{% include note.html content='Available since 8.637.' %}

    To use contextualized chunk embeddings, configure the VoyageAI embedder with a voyage-context-* model and use it to embed an array<string> field containing your document chunks: