From 6bc545b78c4d625e741de29cd18eae90b3a63854 Mon Sep 17 00:00:00 2001
From: fzowl SPLADE ranking
An embedder that uses the VoyageAI embedding API +to generate high-quality embeddings for semantic search. This embedder calls the VoyageAI API service +and does not require local model files or ONNX inference.
+ +{% highlight xml %}
+
+
+ voyage-3.5
+ voyage_api_key
+
+
+{% endhighlight %}
+
+model specifies which VoyageAI model to use.
+ Available models include voyage-3.5 (1024 dimensions, latest and best),
+ voyage-3.5-lite (512 dimensions, fastest),
+ voyage-code-3 (optimized for code), and others.
+ See the VoyageAI documentation for the full list.
+ api-key-secret-ref references a secret in Vespa's
+ secret store containing your VoyageAI API key.
+ This is required for authentication.
+ Add your VoyageAI API key to the secret store:
++vespa secret add voyage_api_key --value "pa-xxxxx..." ++ +
See the reference +for all configuration parameters including caching, retry logic, and performance tuning.
+ +
+ VoyageAI offers several embedding models optimized for different use cases.
+ The resulting tensor type can be float or
+ bfloat16 for storage efficiency.
+
Latest general-purpose models (recommended):
+tensor<float>(x[1024]) - latest and best quality, state-of-the-art for most applicationstensor<float>(x[512]) - newest lite model, excellent quality at lower cost and faster speedPrevious generation general-purpose models:
+tensor<float>(x[1024]) - high quality (use voyage-3.5 for best results)tensor<float>(x[512]) - cost-efficient (use voyage-3.5-lite for better performance)Specialized models:
+tensor<float>(x[1024]) - optimized for code search and technical contenttensor<float>(x[1024]) - optimized for financial documentstensor<float>(x[1024]) - optimized for legal documentstensor<float>(x[1024]) - supports 100+ languagesContextual model:
+tensor<float>(x[1024]) (configurable: 256, 512, 1024, 2048) -
+ contextualized embeddings for document chunks with surrounding context awarenessMultimodal model (preview):
+tensor<float>(x[1024]) (configurable: 256, 512, 1024, 2048) -
+ multimodal embeddings for text, images, and video in a shared vector spaceVoyageAI models distinguish between query and document embeddings for improved retrieval quality. +The embedder automatically detects the context and sets the appropriate input type:
+embed()You can disable auto-detection and set a fixed input type:
+{% highlight xml %}
+
+ voyage-3.5
+ voyage_api_key
+ false
+ query
+
+{% endhighlight %}
+
+The VoyageAI embedder includes several performance optimizations:
+max-idle-connections (default: 5).max-retries (default: 10).normalize set to true.Example with performance tuning:
+{% highlight xml %}
+
+ voyage-3.5
+ voyage_api_key
+ 20
+ true
+
+{% endhighlight %}
+
+Complete example showing document indexing and query-time embedding:
+ +Schema definition:
+
+schema doc {
+ document doc {
+ field text type string {
+ indexing: summary | index
+ }
+ }
+
+ field embedding type tensor<float>(x[1024]) {
+ indexing: input text | embed voyage | attribute | index
+ attribute {
+ distance-metric: angular
+ }
+ }
+
+ rank-profile semantic {
+ inputs {
+ query(q) tensor<float>(x[1024])
+ }
+ first-phase {
+ expression: closeness(field, embedding)
+ }
+ }
+}
+
+
+Query with embedding:
+{% highlight bash %}
+vespa query \
+ 'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding,q)' \
+ 'input.query(q)=embed(voyage, "machine learning tutorials")'
+{% endhighlight %}
+
+When using normalize set to true, use
+distance-metric: prenormalized-angular
+for more efficient similarity computation.
Embedding inference can be resource-intensive for larger embedding models. Factors that impact performance:
diff --git a/en/reference/rag/embedding.html b/en/reference/rag/embedding.html index 0bf9d40a98..3dcfdba557 100644 --- a/en/reference/rag/embedding.html +++ b/en/reference/rag/embedding.html @@ -478,6 +478,199 @@+ An embedder that uses the VoyageAI API + to generate embeddings. This is an API-based embedder that does not require local model files or ONNX inference. + It calls the VoyageAI service to generate high-quality embeddings optimized for semantic search. +
+
+ The VoyageAI embedder is configured in services.xml,
+ within the container tag:
+
{% highlight xml %}
+
+
+ voyage-3.5
+ voyage_api_key
+
+
+{% endhighlight %}
+
++ The VoyageAI API key must be stored in Vespa's + secret store for secure management: +
++vespa secret add voyage_api_key --value "pa-xxxxx..." ++
+ The api-key-secret-ref parameter references the secret name.
+ Secrets are automatically refreshed when rotated without requiring application restart.
+
| Name | +Occurrence | +Description | +Type | +Default | +
|---|---|---|---|---|
| api-key-secret-ref | +One | +Required. Reference to the secret in Vespa's secret store containing the VoyageAI API key. | +string | +N/A | +
| model | +One | +The VoyageAI model to use. Available models:
+
|
+ string | +voyage-3.5 | +
| endpoint | +One | +VoyageAI API endpoint URL. Can be overridden for custom proxies or regional endpoints. | +string | +https://api.voyageai.com/v1/embeddings | +
| timeout | +One | +Request timeout in milliseconds. Also serves as the bound for retry attempts - retries stop when total elapsed time would exceed this timeout. Minimum value: 1000ms. | +numeric | +30000 | +
| max-retries | +One | +Maximum number of retry attempts for failed requests. Used as a safety limit in addition to the timeout-based retry bound. | +numeric | +10 | +
| default-input-type | +One | +Default input type when auto-detection is disabled. Valid values: query or document. VoyageAI models use different optimizations for queries vs documents. |
+ enum | +document | +
| auto-detect-input-type | +One | +Whether to automatically detect input type based on context. When enabled, uses query type for query-time embeddings and document type for indexing. |
+ boolean | +true | +
| normalize | +One | +Whether to apply L2 normalization to embeddings. When enabled, all embedding vectors are normalized to unit length. Use with prenormalized-angular distance-metric for efficient similarity computation. |
+ boolean | +false | +
| truncate | +One | +Whether to truncate input text exceeding model limits. When enabled, text is automatically truncated. When disabled, requests with too-long text will fail. | +boolean | +true | +
| max-idle-connections | +One | +Maximum number of idle HTTP connections to keep in the connection pool. Higher values improve throughput for concurrent requests but use more resources. | +numeric | +5 | +
Basic configuration (recommended):
+{% highlight xml %}
+
+ voyage-3.5
+ voyage_api_key
+
+{% endhighlight %}
+
+High-performance configuration:
+{% highlight xml %}
+
+ voyage-3.5
+ voyage_api_key
+ 20
+ 60000
+
+{% endhighlight %}
+
+Fast and cost-efficient configuration:
+{% highlight xml %}
+
+ voyage-3.5-lite
+ voyage_api_key
+
+{% endhighlight %}
+
+Query-optimized configuration:
+{% highlight xml %}
+
+ voyage-3.5
+ voyage_api_key
+ query
+ false
+ true
+
+{% endhighlight %}
+
+Code search configuration:
+{% highlight xml %}
+
+ voyage-code-3
+ voyage_api_key
+ true
+
+{% endhighlight %}
+
+The VoyageAI embedder includes several features to reduce API costs and improve performance:
+max-idle-connections (default: 5).max-retries (default: 10).voyage-3.5-lite for cost-sensitive applications (512 dimensions vs 1024 dimensions reduces costs while maintaining excellent quality). For best quality, use voyage-3.5.For detailed performance monitoring, the embedder emits standard Vespa embedder metrics + (see Container Metrics). + Monitor API usage and costs through the VoyageAI dashboard.
+ + +
The Huggingface tokenizer embedder is configured in services.xml,
From a270864d7c83704c60e4be54299267b11cab58fc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= An embedder that uses the VoyageAI embedding API
to generate high-quality embeddings for semantic search. This embedder calls the VoyageAI API service
-and does not require local model files or ONNX inference.VoyageAI Embedder
{% highlight xml %}
- voyage-3.5
+ voyage-4
voyage_api_key
@@ -514,10 +517,6 @@ VoyageAI Embedder
model specifies which VoyageAI model to use.
- Available models include voyage-3.5 (1024 dimensions, latest and best),
- voyage-3.5-lite (512 dimensions, fastest),
- voyage-code-3 (optimized for code), and others.
- See the VoyageAI documentation for the full list.
api-key-secret-ref references a secret in Vespa's
@@ -526,89 +525,26 @@ Add your VoyageAI API key to the secret store:
--vespa secret add voyage_api_key --value "pa-xxxxx..." -- -
See the reference -for all configuration parameters including caching, retry logic, and performance tuning.
+See the reference +for all configuration parameters.
- VoyageAI offers several embedding models optimized for different use cases.
- The resulting tensor type can be float or
- bfloat16 for storage efficiency.
-
Latest general-purpose models (recommended):
+For the complete list of available models and their specifications, see:
tensor<float>(x[1024]) - latest and best quality, state-of-the-art for most applicationstensor<float>(x[512]) - newest lite model, excellent quality at lower cost and faster speedPrevious generation general-purpose models:
-tensor<float>(x[1024]) - high quality (use voyage-3.5 for best results)tensor<float>(x[512]) - cost-efficient (use voyage-3.5-lite for better performance)Specialized models:
-tensor<float>(x[1024]) - optimized for code search and technical contenttensor<float>(x[1024]) - optimized for financial documentstensor<float>(x[1024]) - optimized for legal documentstensor<float>(x[1024]) - supports 100+ languagesContextual model:
-tensor<float>(x[1024]) (configurable: 256, 512, 1024, 2048) -
- contextualized embeddings for document chunks with surrounding context awarenessMultimodal model (preview):
-tensor<float>(x[1024]) (configurable: 256, 512, 1024, 2048) -
- multimodal embeddings for text, images, and video in a shared vector spaceVoyageAI models distinguish between query and document embeddings for improved retrieval quality. -The embedder automatically detects the context and sets the appropriate input type:
-embed()You can disable auto-detection and set a fixed input type:
-{% highlight xml %}
-
- voyage-3.5
- voyage_api_key
- false
- query
-
-{% endhighlight %}
+The embedder automatically detects the context and sets the appropriate input type based on whether
+the embedding is performed during feed (indexing) or query processing in Vespa.
-The VoyageAI embedder includes several performance optimizations:
-max-idle-connections (default: 5).max-retries (default: 10).normalize set to true.Example with performance tuning:
-{% highlight xml %}
-
- voyage-3.5
- voyage_api_key
- 20
- true
-
-{% endhighlight %}
+For advanced use cases where you need to control the input type programmatically,
+you can use the destination property of the
+Embedder.Context
+when calling the embedder from Java code.
Complete example showing document indexing and query-time embedding:
@@ -625,7 +561,7 @@When using normalize set to true, use
-distance-metric: prenormalized-angular
-for more efficient similarity computation.
An embedder that uses the VoyageAI API - to generate embeddings. This is an API-based embedder that does not require local model files or ONNX inference. - It calls the VoyageAI service to generate high-quality embeddings optimized for semantic search. + to generate embeddings.
The VoyageAI embedder is configured in services.xml, @@ -491,25 +490,14 @@
{% highlight xml %}
- voyage-3.5
+ voyage-law-2
voyage_api_key
+ https://api.voyageai.com/v1/embeddings
+ true
{% endhighlight %}
-- The VoyageAI API key must be stored in Vespa's - secret store for secure management: -
--vespa secret add voyage_api_key --value "pa-xxxxx..." --
- The api-key-secret-ref parameter references the secret name.
- Secrets are automatically refreshed when rotated without requiring application restart.
-
| api-key-secret-ref | One | -Required. Reference to the secret in Vespa's secret store containing the VoyageAI API key. | +Required. Reference to the secret in Vespa's secret store containing the VoyageAI API key. | string | N/A | |
| model | One | -The VoyageAI model to use. Available models:
-
| Required. The VoyageAI model to use. See the VoyageAI embeddings documentation for the complete list of available models including general-purpose, specialized, contextualized, and multimodal models. | string | -voyage-3.5 | +N/A |
| endpoint | @@ -556,41 +532,6 @@string | https://api.voyageai.com/v1/embeddings | ||||
| timeout | -One | -Request timeout in milliseconds. Also serves as the bound for retry attempts - retries stop when total elapsed time would exceed this timeout. Minimum value: 1000ms. | -numeric | -30000 | -||
| max-retries | -One | -Maximum number of retry attempts for failed requests. Used as a safety limit in addition to the timeout-based retry bound. | -numeric | -10 | -||
| default-input-type | -One | -Default input type when auto-detection is disabled. Valid values: query or document. VoyageAI models use different optimizations for queries vs documents. |
- enum | -document | -||
| auto-detect-input-type | -One | -Whether to automatically detect input type based on context. When enabled, uses query type for query-time embeddings and document type for indexing. |
- boolean | -true | -||
| normalize | -One | -Whether to apply L2 normalization to embeddings. When enabled, all embedding vectors are normalized to unit length. Use with prenormalized-angular distance-metric for efficient similarity computation. |
- boolean | -false | -||
| truncate | One | @@ -598,79 +539,9 @@boolean | true | |||
| max-idle-connections | -One | -Maximum number of idle HTTP connections to keep in the connection pool. Higher values improve throughput for concurrent requests but use more resources. | -numeric | -5 | -
Basic configuration (recommended):
-{% highlight xml %}
-
- voyage-3.5
- voyage_api_key
-
-{% endhighlight %}
-
-High-performance configuration:
-{% highlight xml %}
-
- voyage-3.5
- voyage_api_key
- 20
- 60000
-
-{% endhighlight %}
-
-Fast and cost-efficient configuration:
-{% highlight xml %}
-
- voyage-3.5-lite
- voyage_api_key
-
-{% endhighlight %}
-
-Query-optimized configuration:
-{% highlight xml %}
-
- voyage-3.5
- voyage_api_key
- query
- false
- true
-
-{% endhighlight %}
-
-Code search configuration:
-{% highlight xml %}
-
- voyage-code-3
- voyage_api_key
- true
-
-{% endhighlight %}
-
-The VoyageAI embedder includes several features to reduce API costs and improve performance:
-max-idle-connections (default: 5).max-retries (default: 10).voyage-3.5-lite for cost-sensitive applications (512 dimensions vs 1024 dimensions reduces costs while maintaining excellent quality). For best quality, use voyage-3.5.For detailed performance monitoring, the embedder emits standard Vespa embedder metrics - (see Container Metrics). - Monitor API usage and costs through the VoyageAI dashboard.
- - -
The Huggingface tokenizer embedder is configured in services.xml,
From 32cc54aff55e25628a1839f1f09199f043e9b24b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= For production deployments, we recommend configuring separate embedder components for feed and search operations.
+This architectural pattern provides two key benefits - cost optimization and rate limit isolation.
+In Vespa Cloud, it's best practice to configure these embedders in separate container clusters for feed and search. The Voyage 4 model family features a shared embedding space
+across different model sizes. This enables a cost-effective strategy where you can use a more powerful (and expensive) model
+for document embeddings, while using a smaller, cheaper model for query embeddings.
+Since document embedding happens once during indexing but query embedding occurs on every search request,
+this approach can significantly reduce operational costs while maintaining quality. Separating feed and search operations is particularly important for managing VoyageAI API rate limits.
+Bursty document feeding operations can consume significant API quota, potentially causing rate limit errors
+that affect search queries. By using separate API keys for feed and search embedders,
+you ensure that feeding bursts don't negatively impact search.Usage example
'input.query(q)=embed(voyage, "machine learning tutorials")'
{% endhighlight %}
+Best practices
+{% highlight xml %}
+
+
+Cost optimization with model variants
+Rate limit isolation
+Embedder performance
From 2ca355dfcc097117d95dc3d24aba0bb4191fcf8d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= Input type detection
Embedder.Context
when calling the embedder from Java code.
Complete example showing document indexing and query-time embedding:
- -Schema definition:
-
-schema doc {
- document doc {
- field text type string {
- indexing: summary | index
- }
- }
-
- field embedding type tensor<float>(x[1024]) {
- indexing: input text | embed voyage | attribute | index
- attribute {
- distance-metric: prenormalized-angular
- }
- }
-
- rank-profile semantic {
- inputs {
- query(q) tensor<float>(x[1024])
- }
- first-phase {
- expression: closeness(field, embedding)
- }
- }
-}
-
-
-Query with embedding:
-{% highlight bash %}
-vespa query \
- 'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding,q)' \
- 'input.query(q)=embed(voyage, "machine learning tutorials")'
-{% endhighlight %}
-
For production deployments, we recommend configuring separate embedder components for feed and search operations.
This architectural pattern provides two key benefits - cost optimization and rate limit isolation.
From 0328f897d858d5cdc759113b2104aaddf8434ede Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= To use contextualized chunk embeddings,
+configure the VoyageAI embedder with a
+ When embedding array fields with a contextualized chunk embedding model, Vespa sends all chunks from a document in a single API request,
+ allowing Voyage to encode each chunk with context from the other chunks.
+ Be aware that the combined size of all chunks in a document must fit within the VoyageAI API's input token limit.
+ See Working with chunks for chunking strategies.
+ VoyageAI models distinguish between query and document embeddings for improved retrieval quality.
The embedder automatically detects the context and sets the appropriate input type based on whether
From 80293f6db216ab4fccdc0c86f4974c69cdb4caea Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= To use contextualized chunk embeddings,
configure the VoyageAI embedder with a VoyageAI Embedder
Best practices
Best practices
VoyageAI Embedder
VoyageAI embedder reference config
-
api-key-secret-ref
+ model
One
- Required. Reference to the secret in Vespa's secret store containing the VoyageAI API key.
+ Required. The VoyageAI model to use. See the VoyageAI embeddings documentation for the complete list of available models including general-purpose, specialized, contextualized, and multimodal models.
+
string
N/A
-
+ model
+ dimensions
One
- Required. The VoyageAI model to use. See the VoyageAI embeddings documentation for the complete list of available models including general-purpose, specialized, contextualized, and multimodal models.
-
+ Required. The number of dimensions for the output embedding vectors. Must match the tensor field definition in your schema. Valid values are
+ 256, 512, 1024, 1536, or 2048. See the VoyageAI embeddings documentation for model-specific dimension support.integer
+ N/A
+
+
@@ -539,6 +547,13 @@ api-key-secret-ref
+ One
+ Required. Reference to the secret in Vespa's secret store containing the VoyageAI API key.
string
N/A
VoyageAI embedder reference config
From 9a7c8ceec5e6513335c803744c111432f7d496a8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= boolean
true
+
+
quantization
+ Optional
+ Output quantization format for embedding vectors. Valid values are
+ auto, float, int8, or binary. When set to auto, the embedder infers the appropriate quantization from the dimensions and cell type of the destination tensor in your schema. See the VoyageAI quantization documentation for details on quantization options and their trade-offs.string
+ auto
+ VoyageAI embedder models
Contextualized chunk embeddings
+voyage-context-* model and use it to embed an
+array<string> field containing your document chunks:
+schema doc {
+ document doc {
+ field chunks type array<string> {
+ indexing: index | summary
+ }
+ }
+ field embeddings type tensor<float>(chunk{}, x[1024]) {
+ indexing: input chunks | embed voyage | attribute | index
+ attribute {
+ distance-metric: prenormalized-angular
+ }
+ }
+}
+
+
+Input type detection
VoyageAI embedder reference config
endpoint
- One
- VoyageAI API endpoint URL. Can be overridden for custom proxies or regional endpoints.
+ Optional
+ VoyageAI API endpoint URL.
string
https://api.voyageai.com/v1/embeddings
truncate
- One
+ Optional
Whether to truncate input text exceeding model limits. When enabled, text is automatically truncated. When disabled, requests with too-long text will fail.
boolean
true
@@ -550,7 +550,13 @@ VoyageAI embedder reference config
From 324162eb837a075d31bbac310f12dcdf755448f2 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= quantization
Optional
- Output quantization format for embedding vectors. Valid values are
+ auto, float, int8, or binary. When set to auto, the embedder infers the appropriate quantization from the dimensions and cell type of the destination tensor in your schema. See the VoyageAI quantization documentation for details on quantization options and their trade-offs.Output quantization format for embedding vectors. Valid values are
auto, float,
+ int8, or binary. When set to auto,
+ the embedder infers the appropriate quantization from the dimensions and cell type of the destination tensor in your schema.
+ When using binary quantization, the destination tensor field must use int8 cell type
+ with 1/8 of the dimensions specified in the embedder configuration (e.g., 1024 dimensions → tensor<int8>(x[128])).
+ See the VoyageAI quantization documentation
+ for details on quantization options and binarizing vectors for more on binary quantization in Vespa.string
auto
VoyageAI embedder models
Contextualized chunk embeddings
+{% include note.html content='Available since 8.637.' %}
voyage-context-* model and use it to embed an
array<string> field containing your document chunks: