diff --git a/README.md b/README.md index 0776a1a5fb..1580de5d10 100644 --- a/README.md +++ b/README.md @@ -48,12 +48,12 @@ Mac $ docker run -ti --rm --name doc \ --publish 4000:4000 -e JEKYLL_UID=$UID -v $(pwd):/srv/jekyll \ - jekyll/jekyll jekyll serve + jekyll/jekyll jekyll serve --incremental --force_polling or RHEL 8 $ podman run -it --rm --name doc -p 4000:4000 -e JEKYLL_ROOTLESS=true \ - -v "$PWD":/srv/jekyll:Z docker.io/jekyll/jekyll jekyll serve + -v "$PWD":/srv/jekyll:Z docker.io/jekyll/jekyll jekyll serve --incremental --force_polling The Jekyll server should normally rebuild HTML files automatically when a source files changes. If this does not happen, you can use diff --git a/en/rag/embedding.html b/en/rag/embedding.html index ab52624497..f3a4cef3d3 100644 --- a/en/rag/embedding.html +++ b/en/rag/embedding.html @@ -496,6 +496,124 @@

SPLADE ranking

+

VoyageAI Embedder

+ +

An embedder that uses the VoyageAI embedding API +to generate high-quality embeddings for semantic search. This embedder calls the VoyageAI API service +and does not require local model files or ONNX inference. All embeddings returned by VoyageAI are normalized +to unit length, making them suitable for cosine similarity and +prenormalized-angular distance metrics +(see VoyageAI FAQ).

+ +
{% highlight xml %}
+
+    
+        voyage-4
+        voyage_api_key
+        1024
+    
+
+{% endhighlight %}
+ + + +

See the reference +for all configuration parameters.

+ +

VoyageAI embedder models

+

For the complete list of available models and their specifications, see:

+ + +

Contextualized chunk embeddings

+{% include note.html content='Available since 8.637.' %} +

To use contextualized chunk embeddings, +configure the VoyageAI embedder with a voyage-context-* model and use it to embed an +array<string> field containing your document chunks:

+ +
+schema doc {
+    document doc {
+        field chunks type array<string> {
+            indexing: index | summary
+        }
+    }
+    field embeddings type tensor<float>(chunk{}, x[1024]) {
+        indexing: input chunks | embed voyage | attribute | index
+        attribute {
+            distance-metric: prenormalized-angular
+        }
+    }
+}
+
+ +

+ When embedding array fields with a contextualized chunk embedding model, Vespa sends all chunks from a document in a single API request, + allowing Voyage to encode each chunk with context from the other chunks. + Be aware that the combined size of all chunks in a document must fit within the VoyageAI API's input token limit. + See Working with chunks for chunking strategies. +

+ +

Input type detection

+

VoyageAI models distinguish between query and document embeddings for improved retrieval quality. +The embedder automatically detects the context and sets the appropriate input type based on whether +the embedding is performed during feed (indexing) or query processing in Vespa.

+ +

For advanced use cases where you need to control the input type programmatically, +you can use the destination property of the +Embedder.Context +when calling the embedder from Java code.

+ +

Best practices

+

For production deployments, we recommend configuring separate embedder components for feed and search operations. +This architectural pattern provides two key benefits - cost optimization and rate limit isolation. +In Vespa Cloud, it's best practice to configure these embedders in separate container clusters for feed and search.

+ +
{% highlight xml %}
+
+    
+        voyage-4-large
+        1024
+        voyage_feed_api_key
+    
+    
+
+
+
+    
+        voyage-4-lite
+        1024
+        voyage_search_api_key
+    
+    
+
+{% endhighlight %}
+ +
Cost optimization with model variants
+

The Voyage 4 model family features a shared embedding space +across different model sizes. This enables a cost-effective strategy where you can use a more powerful (and expensive) model +for document embeddings, while using a smaller, cheaper model for query embeddings. +Since document embedding happens once during indexing but query embedding occurs on every search request, +this approach can significantly reduce operational costs while maintaining quality.

+ +
Rate limit isolation
+

Separating feed and search operations is particularly important for managing VoyageAI API rate limits. +Bursty document feeding operations can consume significant API quota, potentially causing rate limit errors +that affect search queries. By using separate API keys for feed and search embedders, +you ensure that feeding bursts don't negatively impact search.

+

Embedder performance

Embedding inference can be resource-intensive for larger embedding models. Factors that impact performance:

diff --git a/en/rag/external-llms.md b/en/rag/external-llms.md index 29099a57cb..9a52de67c6 100644 --- a/en/rag/external-llms.md +++ b/en/rag/external-llms.md @@ -59,11 +59,15 @@ This sets up a client component that can be used in a Vespa provides several options to configure the API key used by the client. -1. Using the [Vespa Cloud secret store](../security/secret-store) to store the API key. -2. This is done by setting the `apiKeySecretRef` configuration parameter to the name of the secret -3. in the secret store. This is the recommended way for Vespa Cloud users. -2. Providing the API key in the `X-LLM-API-KEY` HTTP header of the Vespa query. -3. It is also possible to configure the API key in a custom component. For example, [this](https://github.com/vespa-engine/system-test/tree/master/tests/docproc/generate_field_openai) system-test shows how to retrieve the API key from a local file deployed with your Vespa application. Please note that this is NOT recommended for production use, as it is less secure than using the secret store, but it can be modified to suit your needs. +1. Using the [Vespa Cloud secret store](../security/secret-store) to store the API key. + This is done by setting the `apiKeySecretRef` configuration parameter to the name of the secret + in the secret store. This is the recommended way for Vespa Cloud users. +2. For self-managed Vespa, you can provide secrets via environment variables. + Set the `apiKeySecretRef` configuration parameter and expose the secret as an environment variable + named `VESPA_SECRET_`, where `` is the secret reference name converted to + upper snake case. For example, if `apiKeySecretRef` is set to `myApiKey`, the environment variable + should be named `VESPA_SECRET_MY_API_KEY`. +3. Providing the API key in the `X-LLM-API-KEY` HTTP header of the Vespa query. You can set up multiple connections with different settings. For instance, you might want to run different LLMs for different tasks. To distinguish between the diff --git a/en/reference/rag/embedding.html b/en/reference/rag/embedding.html index 0bf9d40a98..7d46ab4664 100644 --- a/en/reference/rag/embedding.html +++ b/en/reference/rag/embedding.html @@ -478,6 +478,91 @@

splade embedder reference config

+

VoyageAI Embedder

+

+ An embedder that uses the VoyageAI API + to generate embeddings. +

+

+ The VoyageAI embedder is configured in services.xml, + within the container tag: +

+
{% highlight xml %}
+
+    
+        voyage-law-2
+        voyage_api_key
+        1024
+        https://api.voyageai.com/v1/embeddings
+        true
+    
+
+{% endhighlight %}
+ +

VoyageAI embedder reference config

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameOccurrenceDescriptionTypeDefault
modelOneRequired. The VoyageAI model to use. See the VoyageAI embeddings documentation for the complete list of available models including general-purpose, specialized, contextualized, and multimodal models. + stringN/A
dimensionsOneRequired. The number of dimensions for the output embedding vectors. Must match the tensor field definition in your schema. Valid values are 256, 512, 1024, 1536, or 2048. See the VoyageAI embeddings documentation for model-specific dimension support.integerN/A
api-key-secret-refOneRequired. Reference to the secret in Vespa's secret store containing the VoyageAI API key.stringN/A
endpointOptionalVoyageAI API endpoint URL.stringhttps://api.voyageai.com/v1/embeddings
truncateOptionalWhether to truncate input text exceeding model limits. When enabled, text is automatically truncated. When disabled, requests with too-long text will fail.booleantrue
quantizationOptionalOutput quantization format for embedding vectors. Valid values are auto, float, + int8, or binary. When set to auto, + the embedder infers the appropriate quantization from the dimensions and cell type of the destination tensor in your schema. + When using binary quantization, the destination tensor field must use int8 cell type + with 1/8 of the dimensions specified in the embedder configuration (e.g., 1024 dimensions → tensor<int8>(x[128])). + See the VoyageAI quantization documentation + for details on quantization options and binarizing vectors for more on binary quantization in Vespa.stringauto
+

Huggingface tokenizer embedder

The Huggingface tokenizer embedder is configured in services.xml,