Skip to content
Open
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,12 +48,12 @@ Mac

$ docker run -ti --rm --name doc \
--publish 4000:4000 -e JEKYLL_UID=$UID -v $(pwd):/srv/jekyll \
jekyll/jekyll jekyll serve
jekyll/jekyll jekyll serve --incremental --force_polling

or RHEL 8

$ podman run -it --rm --name doc -p 4000:4000 -e JEKYLL_ROOTLESS=true \
-v "$PWD":/srv/jekyll:Z docker.io/jekyll/jekyll jekyll serve
-v "$PWD":/srv/jekyll:Z docker.io/jekyll/jekyll jekyll serve --incremental --force_polling

The Jekyll server should normally rebuild HTML files automatically
when a source files changes. If this does not happen, you can use
Expand Down
118 changes: 118 additions & 0 deletions en/rag/embedding.html
Original file line number Diff line number Diff line change
Expand Up @@ -496,6 +496,124 @@ <h4 id="splade-ranking">SPLADE ranking</h4>
</p>


<h3 id="voyageai-embedder">VoyageAI Embedder</h3>

<p>An embedder that uses the <a href="https://www.voyageai.com/">VoyageAI</a> embedding API
to generate high-quality embeddings for semantic search. This embedder calls the VoyageAI API service
and does not require local model files or ONNX inference. All embeddings returned by VoyageAI are normalized
to unit length, making them suitable for cosine similarity and
<a href="../reference/schemas/schemas.html#prenormalized-angular">prenormalized-angular</a> distance metrics
(see <a href="https://docs.voyageai.com/docs/faq#which-similarity-function-should-i-use">VoyageAI FAQ</a>).</p>

<pre>{% highlight xml %}
<container version="1.0">
<component id="voyage" type="voyage-ai-embedder">
<model>voyage-4</model>
<api-key-secret-ref>voyage_api_key</api-key-secret-ref>
<dimensions>1024</dimensions>
</component>
</container>
{% endhighlight %}</pre>

<ul>
<li>
The <code>model</code> specifies which VoyageAI model to use.
</li>
<li>
The <code>api-key-secret-ref</code> references a secret in Vespa's
<a href="/en/cloud/security/secret-store.html">secret store</a> containing your VoyageAI API key.
This is required for authentication.
</li>
</ul>

<p>See the <a href="../reference/rag/embedding.html#voyageai-embedder-reference-config">reference</a>
for all configuration parameters.</p>

<h4 id="voyageai-embedder-models">VoyageAI embedder models</h4>
<p>For the complete list of available models and their specifications, see:</p>
<ul>
<li><a href="https://docs.voyageai.com/docs/embeddings">VoyageAI Embeddings Documentation</a> - General-purpose and specialized models</li>
<li><a href="https://docs.voyageai.com/docs/contextualized-chunk-embeddings">Contextualized Chunk Embeddings</a> - Models for embedding document chunks with surrounding context. See <a href="../rag/working-with-chunks.html">Working with chunks</a>.</li>
<li><a href="https://docs.voyageai.com/docs/multimodal-embeddings">Multimodal Embeddings</a> - Multimodal models for text, images, and video</li>
</ul>

<h4 id="voyageai-contextualized-embeddings">Contextualized chunk embeddings</h4>
{% include note.html content='Available since 8.637.' %}
<p>To use <a href="https://docs.voyageai.com/docs/contextualized-chunk-embeddings">contextualized chunk embeddings</a>,
configure the VoyageAI embedder with a <code>voyage-context-*</code> model and use it to embed an
<code>array&lt;string&gt;</code> field containing your document chunks:</p>

<pre>
schema doc {
document doc {
field chunks type array&lt;string&gt; {
indexing: index | summary
}
}
field embeddings type tensor&lt;float&gt;(chunk{}, x[1024]) {
indexing: input chunks | embed voyage | attribute | index
attribute {
distance-metric: prenormalized-angular
}
}
}
</pre>

<p>
When embedding array fields with a contextualized chunk embedding model, Vespa sends all chunks from a document in a single API request,
allowing Voyage to encode each chunk with context from the other chunks.
Be aware that the combined size of all chunks in a document must fit within the VoyageAI API's input token limit.
See <a href="working-with-chunks.html">Working with chunks</a> for chunking strategies.
</p>

<h4 id="voyageai-input-types">Input type detection</h4>
<p>VoyageAI models distinguish between query and document embeddings for improved retrieval quality.
The embedder automatically detects the context and sets the appropriate input type based on whether
the embedding is performed during feed (indexing) or query processing in Vespa.</p>

<p>For advanced use cases where you need to control the input type programmatically,
you can use the <code>destination</code> property of the
<a href="https://javadoc.io/static/com.yahoo.vespa/linguistics/8.620.35/com/yahoo/language/process/Embedder.Context.html">Embedder.Context</a>
when calling the embedder from Java code.</p>

<h4 id="voyageai-best-practices">Best practices</h4>
<p>For production deployments, we recommend configuring <strong>separate embedder components for feed and search operations</strong>.
This architectural pattern provides two key benefits - cost optimization and rate limit isolation.
In Vespa Cloud, it's best practice to configure these embedders in separate container clusters for feed and search.</p>

<pre>{% highlight xml %}
<container id="feed" version="1.0">
<component id="voyage" type="voyage-ai-embedder">
<model>voyage-4-large</model>
<dimensions>1024</dimensions>
<api-key-secret-ref>voyage_feed_api_key</api-key-secret-ref>
</component>
<document-api/>
</container>

<container id="search" version="1.0">
<component id="voyage" type="voyage-ai-embedder">
<model>voyage-4-lite</model>
<dimensions>1024</dimensions>
<api-key-secret-ref>voyage_search_api_key</api-key-secret-ref>
</component>
<search/>
</container>
{% endhighlight %}</pre>

<h5 id="voyageai-cost-optimization">Cost optimization with model variants</h5>
<p>The <a href="https://blog.voyageai.com/2026/01/15/voyage-4/">Voyage 4 model family</a> features a shared embedding space
across different model sizes. This enables a cost-effective strategy where you can use a more powerful (and expensive) model
for document embeddings, while using a smaller, cheaper model for query embeddings.
Since document embedding happens once during indexing but query embedding occurs on every search request,
this approach can significantly reduce operational costs while maintaining quality.</p>

<h5 id="voyageai-rate-limit-isolation">Rate limit isolation</h5>
<p>Separating feed and search operations is particularly important for managing VoyageAI API rate limits.
Bursty document feeding operations can consume significant API quota, potentially causing rate limit errors
that affect search queries. By using <strong>separate API keys</strong> for feed and search embedders,
you ensure that feeding bursts don't negatively impact search.</p>

<h2 id="embedder-performance">Embedder performance</h2>

<p>Embedding inference can be resource-intensive for larger embedding models. Factors that impact performance:</p>
Expand Down
14 changes: 9 additions & 5 deletions en/rag/external-llms.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,11 +59,15 @@ This sets up a client component that can be used in a

Vespa provides several options to configure the API key used by the client.

1. Using the [Vespa Cloud secret store](../security/secret-store) to store the API key.
2. This is done by setting the `apiKeySecretRef` configuration parameter to the name of the secret
3. in the secret store. This is the recommended way for Vespa Cloud users.
2. Providing the API key in the `X-LLM-API-KEY` HTTP header of the Vespa query.
3. It is also possible to configure the API key in a custom component. For example, [this](https://github.com/vespa-engine/system-test/tree/master/tests/docproc/generate_field_openai) system-test shows how to retrieve the API key from a local file deployed with your Vespa application. Please note that this is NOT recommended for production use, as it is less secure than using the secret store, but it can be modified to suit your needs.
1. Using the [Vespa Cloud secret store](../security/secret-store) to store the API key.
This is done by setting the `apiKeySecretRef` configuration parameter to the name of the secret
in the secret store. This is the recommended way for Vespa Cloud users.
2. For self-managed Vespa, you can provide secrets via environment variables.
Set the `apiKeySecretRef` configuration parameter and expose the secret as an environment variable
named `VESPA_SECRET_<SECRET_REF>`, where `<SECRET_REF>` is the secret reference name converted to
upper snake case. For example, if `apiKeySecretRef` is set to `myApiKey`, the environment variable
should be named `VESPA_SECRET_MY_API_KEY`.
3. Providing the API key in the `X-LLM-API-KEY` HTTP header of the Vespa query.

You can set up multiple connections with different settings. For instance, you
might want to run different LLMs for different tasks. To distinguish between the
Expand Down
85 changes: 85 additions & 0 deletions en/reference/rag/embedding.html
Original file line number Diff line number Diff line change
Expand Up @@ -478,6 +478,91 @@ <h3 id="splade-embedder-reference-config">splade embedder reference config</h3>



<h2 id="voyageai-embedder">VoyageAI Embedder</h2>
<p>
An embedder that uses the <a href="https://www.voyageai.com/">VoyageAI</a> API
to generate embeddings.
</p>
<p>
The VoyageAI embedder is configured in <a href="services.html">services.xml</a>,
within the <code>container</code> tag:
</p>
<pre>{% highlight xml %}
<container id="default" version="1.0">
<component id="voyage" type="voyage-ai-embedder">
<model>voyage-law-2</model>
<api-key-secret-ref>voyage_api_key</api-key-secret-ref>
<dimensions>1024</dimensions>
<endpoint>https://api.voyageai.com/v1/embeddings</endpoint>
<truncate>true</truncate>
</component>
</container>
{% endhighlight %}</pre>

<h3 id="voyageai-embedder-reference-config">VoyageAI embedder reference config</h3>
<table class="table">
<thead>
<tr>
<th>Name</th>
<th>Occurrence</th>
<th>Description</th>
<th>Type</th>
<th>Default</th>
</tr>
</thead>
<tbody>
<tr>
<td>model</td>
<td>One</td>
<td><strong>Required</strong>. The VoyageAI model to use. See the <a href="https://docs.voyageai.com/docs/embeddings">VoyageAI embeddings documentation</a> for the complete list of available models including general-purpose, specialized, <a href="https://docs.voyageai.com/docs/contextualized-chunk-embeddings">contextualized</a>, and <a href="https://docs.voyageai.com/docs/multimodal-embeddings">multimodal</a> models.
</td>
<td>string</td>
<td>N/A</td>
</tr>
<tr>
<td>dimensions</td>
<td>One</td>
<td><strong>Required</strong>. The number of dimensions for the output embedding vectors. Must match the tensor field definition in your schema. Valid values are <code>256</code>, <code>512</code>, <code>1024</code>, <code>1536</code>, or <code>2048</code>. See the <a href="https://docs.voyageai.com/docs/embeddings">VoyageAI embeddings documentation</a> for model-specific dimension support.</td>
<td>integer</td>
<td>N/A</td>
</tr>
<tr>
<td>api-key-secret-ref</td>
<td>One</td>
<td><strong>Required</strong>. Reference to the secret in Vespa's <a href="/en/cloud/security/secret-store.html">secret store</a> containing the VoyageAI API key.</td>
<td>string</td>
<td>N/A</td>
</tr>
<tr>
<td>endpoint</td>
<td>Optional</td>
<td>VoyageAI API endpoint URL.</td>
<td>string</td>
<td>https://api.voyageai.com/v1/embeddings</td>
</tr>
<tr>
<td>truncate</td>
<td>Optional</td>
<td>Whether to truncate input text exceeding model limits. When enabled, text is automatically truncated. When disabled, requests with too-long text will fail.</td>
<td>boolean</td>
<td>true</td>
</tr>
<tr>
<td>quantization</td>
<td>Optional</td>
<td>Output quantization format for embedding vectors. Valid values are <code>auto</code>, <code>float</code>,
<code>int8</code>, or <code>binary</code>. When set to <code>auto</code>,
the embedder infers the appropriate quantization from the dimensions and cell type of the destination tensor in your schema.
When using <code>binary</code> quantization, the destination tensor field must use <code>int8</code> cell type
with 1/8 of the dimensions specified in the embedder configuration (e.g., 1024 dimensions → <code>tensor&lt;int8&gt;(x[128])</code>).
See the <a href="https://docs.voyageai.com/docs/flexible-dimensions-and-quantization#quantization">VoyageAI quantization documentation</a>
for details on quantization options and <a href="../../rag/binarizing-vectors.html">binarizing vectors</a> for more on binary quantization in Vespa.</td>
<td>string</td>
<td>auto</td>
</tr>
</tbody>
</table>

<h2 id="huggingface-tokenizer-embedder">Huggingface tokenizer embedder</h2>
<p>
The Huggingface tokenizer embedder is configured in <a href="../applications/services/services.html">services.xml</a>,
Expand Down