From 6bc545b78c4d625e741de29cd18eae90b3a63854 Mon Sep 17 00:00:00 2001
From: fzowl <zoltan@voyageai.com>
Date: Mon, 22 Dec 2025 15:20:49 +0100
Subject: [PATCH 01/10] Adding VoyageAI embeddings documentation

---
 en/rag/embedding.html           | 156 ++++++++++++++++++++++++++
 en/reference/rag/embedding.html | 193 ++++++++++++++++++++++++++++++++
 2 files changed, 349 insertions(+)
diff --git a/en/rag/embedding.html b/en/rag/embedding.html
index ab52624497..e450cd34c6 100644
--- a/en/rag/embedding.html
+++ b/en/rag/embedding.html
@@ -496,6 +496,162 @@ <h4 id="splade-ranking">SPLADE ranking</h4>
 </p>
 
 
+<h3 id="voyageai-embedder">VoyageAI Embedder</h3>
+
+<p>An embedder that uses the <a href="https://www.voyageai.com/">VoyageAI</a> embedding API
+to generate high-quality embeddings for semantic search. This embedder calls the VoyageAI API service
+and does not require local model files or ONNX inference.</p>
+
+<pre>{% highlight xml %}
+<container version="1.0">
+    <component id="voyage" type="voyage-ai-embedder">
+        <model>voyage-3.5</model>
+        <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
+    </component>
+</container>
+{% endhighlight %}</pre>
+
+<ul>
+    <li>
+        The <code>model</code> specifies which VoyageAI model to use.
+        Available models include <code>voyage-3.5</code> (1024 dimensions, latest and best),
+        <code>voyage-3.5-lite</code> (512 dimensions, fastest),
+        <code>voyage-code-3</code> (optimized for code), and others.
+        See the <a href="https://docs.voyageai.com/docs/embeddings">VoyageAI documentation</a> for the full list.
+    </li>
+    <li>
+        The <code>api-key-secret-ref</code> references a secret in Vespa's
+        <a href="/en/cloud/security/secret-store.html">secret store</a> containing your VoyageAI API key.
+        This is required for authentication.
+    </li>
+</ul>
+
+<p>Add your VoyageAI API key to the secret store:</p>
+<pre>
+vespa secret add voyage_api_key --value "pa-xxxxx..."
+</pre>
+
+<p>See the <a href="reference/embedding-reference.html#voyageai-embedder-reference-config">reference</a>
+for all configuration parameters including caching, retry logic, and performance tuning.</p>
+
+<h4 id="voyageai-embedder-models">VoyageAI embedder models</h4>
+<p>
+    VoyageAI offers several embedding models optimized for different use cases.
+    The resulting <a href="reference/tensor.html#tensor-type-spec">tensor type</a> can be <code>float</code> or
+    <code>bfloat16</code> for storage efficiency.
+</p>
+
+<p>Latest general-purpose models (recommended):</p>
+<ul>
+    <li><a href="https://docs.voyageai.com/docs/embeddings"><strong>voyage-3.5</strong></a> produces <code>tensor&lt;float&gt;(x[1024])</code> - <strong>latest and best quality</strong>, state-of-the-art for most applications</li>
+    <li><strong>voyage-3.5-lite</strong> produces <code>tensor&lt;float&gt;(x[512])</code> - <strong>newest lite model</strong>, excellent quality at lower cost and faster speed</li>
+</ul>
+
+<p>Previous generation general-purpose models:</p>
+<ul>
+    <li>voyage-3 produces <code>tensor&lt;float&gt;(x[1024])</code> - high quality (use voyage-3.5 for best results)</li>
+    <li>voyage-3-lite produces <code>tensor&lt;float&gt;(x[512])</code> - cost-efficient (use voyage-3.5-lite for better performance)</li>
+</ul>
+
+<p>Specialized models:</p>
+<ul>
+    <li>voyage-code-3 produces <code>tensor&lt;float&gt;(x[1024])</code> - optimized for code search and technical content</li>
+    <li>voyage-finance-2 produces <code>tensor&lt;float&gt;(x[1024])</code> - optimized for financial documents</li>
+    <li>voyage-law-2 produces <code>tensor&lt;float&gt;(x[1024])</code> - optimized for legal documents</li>
+    <li>voyage-multilingual-2 produces <code>tensor&lt;float&gt;(x[1024])</code> - supports 100+ languages</li>
+</ul>
+
+<p>Contextual model:</p>
+<ul>
+    <li><strong>voyage-context-3</strong> produces <code>tensor&lt;float&gt;(x[1024])</code> (configurable: 256, 512, 1024, 2048) -
+        contextualized embeddings for document chunks with surrounding context awareness</li>
+</ul>
+
+<p>Multimodal model (preview):</p>
+<ul>
+    <li><strong>voyage-multimodal-3.5</strong> produces <code>tensor&lt;float&gt;(x[1024])</code> (configurable: 256, 512, 1024, 2048) -
+        multimodal embeddings for text, images, and video in a shared vector space</li>
+</ul>
+
+<h4 id="voyageai-input-types">Input type detection</h4>
+<p>VoyageAI models distinguish between query and document embeddings for improved retrieval quality.
+The embedder automatically detects the context and sets the appropriate input type:</p>
+<ul>
+    <li><strong>Query context</strong>: When embedding text in query requests via <code>embed()</code></li>
+    <li><strong>Document context</strong>: When embedding document fields during indexing</li>
+</ul>
+
+<p>You can disable auto-detection and set a fixed input type:</p>
+<pre>{% highlight xml %}
+<component id="voyage" type="voyage-ai-embedder">
+    <model>voyage-3.5</model>
+    <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
+    <auto-detect-input-type>false</auto-detect-input-type>
+    <default-input-type>query</default-input-type>
+</component>
+{% endhighlight %}</pre>
+
+<h4 id="voyageai-performance">VoyageAI performance features</h4>
+<p>The VoyageAI embedder includes several performance optimizations:</p>
+<ul>
+    <li><strong>Caching</strong>: Automatically caches recent embeddings to reduce API calls.</li>
+    <li><strong>Connection pooling</strong>: Reuses HTTP connections for efficiency. Configure with <code>max-idle-connections</code> (default: 5).</li>
+    <li><strong>Retry logic</strong>: Automatically retries on rate limits and transient errors with exponential backoff. Configure with <code>max-retries</code> (default: 10).</li>
+    <li><strong>Normalization</strong>: Optional L2 normalization for cosine similarity. Enable with <code>normalize</code> set to <code>true</code>.</li>
+</ul>
+
+<p>Example with performance tuning:</p>
+<pre>{% highlight xml %}
+<component id="voyage" type="voyage-ai-embedder">
+    <model>voyage-3.5</model>
+    <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
+    <max-idle-connections>20</max-idle-connections>
+    <normalize>true</normalize>
+</component>
+{% endhighlight %}</pre>
+
+<h4 id="voyageai-usage-example">Usage example</h4>
+<p>Complete example showing document indexing and query-time embedding:</p>
+
+<p><strong>Schema definition</strong>:</p>
+<pre>
+schema doc {
+    document doc {
+        field text type string {
+            indexing: summary | index
+        }
+    }
+
+    field embedding type tensor&lt;float&gt;(x[1024]) {
+        indexing: input text | embed voyage | attribute | index
+        attribute {
+            distance-metric: angular
+        }
+    }
+
+    rank-profile semantic {
+        inputs {
+            query(q) tensor&lt;float&gt;(x[1024])
+        }
+        first-phase {
+            expression: closeness(field, embedding)
+        }
+    }
+}
+</pre>
+
+<p><strong>Query with embedding</strong>:</p>
+<pre>{% highlight bash %}
+vespa query \
+  'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding,q)' \
+  'input.query(q)=embed(voyage, "machine learning tutorials")'
+{% endhighlight %}</pre>
+
+<p>When using <code>normalize</code> set to <code>true</code>, use
+<a href="reference/schema-reference.html#prenormalized-angular">distance-metric: prenormalized-angular</a>
+for more efficient similarity computation.</p>
+
+
 <h2 id="embedder-performance">Embedder performance</h2>
 
 <p>Embedding inference can be resource-intensive for larger embedding models. Factors that impact performance:</p>
diff --git a/en/reference/rag/embedding.html b/en/reference/rag/embedding.html
index 0bf9d40a98..3dcfdba557 100644
--- a/en/reference/rag/embedding.html
+++ b/en/reference/rag/embedding.html
@@ -478,6 +478,199 @@ <h3 id="splade-embedder-reference-config">splade embedder reference config</h3>
 
 
 
+<h2 id="voyageai-embedder">VoyageAI Embedder</h2>
+<p>
+  An embedder that uses the <a href="https://www.voyageai.com/">VoyageAI</a> API
+  to generate embeddings. This is an API-based embedder that does not require local model files or ONNX inference.
+  It calls the VoyageAI service to generate high-quality embeddings optimized for semantic search.
+</p>
+<p>
+  The VoyageAI embedder is configured in <a href="services.html">services.xml</a>,
+  within the <code>container</code> tag:
+</p>
+<pre>{% highlight xml %}
+<container id="default" version="1.0">
+    <component id="voyage" type="voyage-ai-embedder">
+        <model>voyage-3.5</model>
+        <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
+    </component>
+</container>
+{% endhighlight %}</pre>
+
+<h3 id="voyageai-secret-management">Secret Management</h3>
+<p>
+  The VoyageAI API key must be stored in Vespa's
+  <a href="/en/cloud/security/secret-store.html">secret store</a> for secure management:
+</p>
+<pre>
+vespa secret add voyage_api_key --value "pa-xxxxx..."
+</pre>
+<p>
+  The <code>api-key-secret-ref</code> parameter references the secret name.
+  Secrets are automatically refreshed when rotated without requiring application restart.
+</p>
+
+<h3 id="voyageai-embedder-reference-config">VoyageAI embedder reference config</h3>
+<table class="table">
+  <thead>
+    <tr>
+      <th>Name</th>
+      <th>Occurrence</th>
+      <th>Description</th>
+      <th>Type</th>
+      <th>Default</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>api-key-secret-ref</td>
+      <td>One</td>
+      <td><strong>Required</strong>. Reference to the secret in Vespa's secret store containing the VoyageAI API key.</td>
+      <td>string</td>
+      <td>N/A</td>
+    </tr>
+    <tr>
+      <td>model</td>
+      <td>One</td>
+      <td>The VoyageAI model to use. Available models:
+        <ul style="margin: 5px 0;">
+          <li><strong><code>voyage-3.5</code></strong> (1024 dims) - <strong>Latest and best quality</strong>, state-of-the-art (recommended)</li>
+          <li><strong><code>voyage-3.5-lite</code></strong> (512 dims) - <strong>Newest lite model</strong>, excellent quality at lower cost</li>
+          <li><code>voyage-3</code> (1024 dims) - Previous generation, high quality</li>
+          <li><code>voyage-3-lite</code> (512 dims) - Previous generation, cost-efficient</li>
+          <li><code>voyage-code-3</code> (1024 dims) - Code search optimization</li>
+          <li><code>voyage-finance-2</code> (1024 dims) - Financial documents</li>
+          <li><code>voyage-law-2</code> (1024 dims) - Legal documents</li>
+          <li><code>voyage-multilingual-2</code> (1024 dims) - Multilingual support</li>
+          <li><code>voyage-context-3</code> (1024 dims, configurable: 256/512/1024/2048) - Contextualized document chunk embeddings</li>
+          <li><code>voyage-multimodal-3.5</code> (1024 dims, configurable: 256/512/1024/2048) - Multimodal embeddings (text, images, video) [preview]</li>
+        </ul>
+      </td>
+      <td>string</td>
+      <td>voyage-3.5</td>
+    </tr>
+    <tr>
+      <td>endpoint</td>
+      <td>One</td>
+      <td>VoyageAI API endpoint URL. Can be overridden for custom proxies or regional endpoints.</td>
+      <td>string</td>
+      <td>https://api.voyageai.com/v1/embeddings</td>
+    </tr>
+    <tr>
+      <td>timeout</td>
+      <td>One</td>
+      <td>Request timeout in milliseconds. Also serves as the bound for retry attempts - retries stop when total elapsed time would exceed this timeout. Minimum value: 1000ms.</td>
+      <td>numeric</td>
+      <td>30000</td>
+    </tr>
+    <tr>
+      <td>max-retries</td>
+      <td>One</td>
+      <td>Maximum number of retry attempts for failed requests. Used as a safety limit in addition to the timeout-based retry bound.</td>
+      <td>numeric</td>
+      <td>10</td>
+    </tr>
+    <tr>
+      <td>default-input-type</td>
+      <td>One</td>
+      <td>Default input type when auto-detection is disabled. Valid values: <code>query</code> or <code>document</code>. VoyageAI models use different optimizations for queries vs documents.</td>
+      <td>enum</td>
+      <td>document</td>
+    </tr>
+    <tr>
+      <td>auto-detect-input-type</td>
+      <td>One</td>
+      <td>Whether to automatically detect input type based on context. When enabled, uses <code>query</code> type for query-time embeddings and <code>document</code> type for indexing.</td>
+      <td>boolean</td>
+      <td>true</td>
+    </tr>
+    <tr>
+      <td>normalize</td>
+      <td>One</td>
+      <td>Whether to apply L2 normalization to embeddings. When enabled, all embedding vectors are normalized to unit length. Use with <code>prenormalized-angular</code> <a href="schema-reference.html#distance-metric">distance-metric</a> for efficient similarity computation.</td>
+      <td>boolean</td>
+      <td>false</td>
+    </tr>
+    <tr>
+      <td>truncate</td>
+      <td>One</td>
+      <td>Whether to truncate input text exceeding model limits. When enabled, text is automatically truncated. When disabled, requests with too-long text will fail.</td>
+      <td>boolean</td>
+      <td>true</td>
+    </tr>
+    <tr>
+      <td>max-idle-connections</td>
+      <td>One</td>
+      <td>Maximum number of idle HTTP connections to keep in the connection pool. Higher values improve throughput for concurrent requests but use more resources.</td>
+      <td>numeric</td>
+      <td>5</td>
+    </tr>
+  </tbody>
+</table>
+
+<h3 id="voyageai-example-configurations">Example Configurations</h3>
+
+<p><strong>Basic configuration (recommended)</strong>:</p>
+<pre>{% highlight xml %}
+<component id="voyage" type="voyage-ai-embedder">
+    <model>voyage-3.5</model>
+    <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
+</component>
+{% endhighlight %}</pre>
+
+<p><strong>High-performance configuration</strong>:</p>
+<pre>{% highlight xml %}
+<component id="voyage" type="voyage-ai-embedder">
+    <model>voyage-3.5</model>
+    <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
+    <max-idle-connections>20</max-idle-connections>
+    <timeout>60000</timeout>
+</component>
+{% endhighlight %}</pre>
+
+<p><strong>Fast and cost-efficient configuration</strong>:</p>
+<pre>{% highlight xml %}
+<component id="voyage-lite" type="voyage-ai-embedder">
+    <model>voyage-3.5-lite</model>
+    <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
+</component>
+{% endhighlight %}</pre>
+
+<p><strong>Query-optimized configuration</strong>:</p>
+<pre>{% highlight xml %}
+<component id="voyage-query" type="voyage-ai-embedder">
+    <model>voyage-3.5</model>
+    <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
+    <default-input-type>query</default-input-type>
+    <auto-detect-input-type>false</auto-detect-input-type>
+    <normalize>true</normalize>
+</component>
+{% endhighlight %}</pre>
+
+<p><strong>Code search configuration</strong>:</p>
+<pre>{% highlight xml %}
+<component id="code-embedder" type="voyage-ai-embedder">
+    <model>voyage-code-3</model>
+    <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
+    <normalize>true</normalize>
+</component>
+{% endhighlight %}</pre>
+
+<h3 id="voyageai-cost-optimization">Cost and Performance Optimization</h3>
+<p>The VoyageAI embedder includes several features to reduce API costs and improve performance:</p>
+<ul>
+  <li><strong>Caching</strong>: The embedder automatically caches recent embeddings to prevent duplicate API calls for the same text. This is particularly effective for repeated queries or documents.</li>
+  <li><strong>Connection pooling</strong>: HTTP connections are reused to reduce connection overhead and improve throughput. Configure with <code>max-idle-connections</code> (default: 5).</li>
+  <li><strong>Retry logic</strong>: Automatic retries with exponential backoff handle rate limits and transient errors, bounded by the global timeout. Configure with <code>max-retries</code> (default: 10).</li>
+  <li><strong>Model selection</strong>: Use <code>voyage-3.5-lite</code> for cost-sensitive applications (512 dimensions vs 1024 dimensions reduces costs while maintaining excellent quality). For best quality, use <code>voyage-3.5</code>.</li>
+</ul>
+
+<p>For detailed performance monitoring, the embedder emits standard Vespa embedder metrics
+  (see <a href="container-metrics-reference.html">Container Metrics</a>).
+  Monitor API usage and costs through the <a href="https://www.voyageai.com/">VoyageAI dashboard</a>.</p>
+
+
+
 <h2 id="huggingface-tokenizer-embedder">Huggingface tokenizer embedder</h2>
   <p>
     The Huggingface tokenizer embedder is configured in <a href="../applications/services/services.html">services.xml</a>,

From a270864d7c83704c60e4be54299267b11cab58fc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= <bjorn.christian@seime.no>
Date: Thu, 22 Jan 2026 08:30:14 +0100
Subject: [PATCH 02/10] chore: use incremental build for containerized dev
 environment

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 0776a1a5fb..1580de5d10 100644
--- a/README.md
+++ b/README.md
@@ -48,12 +48,12 @@ Mac
 
     $ docker run -ti --rm --name doc \
       --publish 4000:4000 -e JEKYLL_UID=$UID -v $(pwd):/srv/jekyll \
-      jekyll/jekyll jekyll serve
+      jekyll/jekyll jekyll serve --incremental --force_polling
 
 or RHEL 8
 
     $ podman run -it --rm --name doc -p 4000:4000 -e JEKYLL_ROOTLESS=true \
-      -v "$PWD":/srv/jekyll:Z docker.io/jekyll/jekyll jekyll serve
+      -v "$PWD":/srv/jekyll:Z docker.io/jekyll/jekyll jekyll serve --incremental --force_polling
 
 The Jekyll server should normally rebuild HTML files automatically
 when a source files changes. If this does not happen, you can use

From 7887e7853ef664aa97e07d7c65aa33b67dd6a7d8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= <bjorn.christian@seime.no>
Date: Thu, 22 Jan 2026 08:47:22 +0100
Subject: [PATCH 03/10] chore: update VoyageAI embedder documentation

- Remove unsupported configuration options (normalize, auto-detect-input-type, default-input-type, timeout, max-retries, max-idle-connections).
- Document automatic vector normalization and input type detection, clarify programmatic control via Embedder.Context destination property.
---
 en/rag/embedding.html           | 104 ++++-------------------
 en/reference/rag/embedding.html | 143 ++------------------------------
 2 files changed, 25 insertions(+), 222 deletions(-)

diff --git a/en/rag/embedding.html b/en/rag/embedding.html
index e450cd34c6..70a344b4a8 100644
--- a/en/rag/embedding.html
+++ b/en/rag/embedding.html
@@ -500,12 +500,15 @@ <h3 id="voyageai-embedder">VoyageAI Embedder</h3>
 
 <p>An embedder that uses the <a href="https://www.voyageai.com/">VoyageAI</a> embedding API
 to generate high-quality embeddings for semantic search. This embedder calls the VoyageAI API service
-and does not require local model files or ONNX inference.</p>
+and does not require local model files or ONNX inference. All embeddings returned by VoyageAI are normalized
+to unit length, making them suitable for cosine similarity and
+<a href="../reference/schemas/schemas.html#prenormalized-angular">prenormalized-angular</a> distance metrics
+(see <a href="https://docs.voyageai.com/docs/faq#which-similarity-function-should-i-use">VoyageAI FAQ</a>).</p>
 
 <pre>{% highlight xml %}
 <container version="1.0">
     <component id="voyage" type="voyage-ai-embedder">
-        <model>voyage-3.5</model>
+        <model>voyage-4</model>
         <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
     </component>
 </container>
@@ -514,10 +517,6 @@ <h3 id="voyageai-embedder">VoyageAI Embedder</h3>
 <ul>
     <li>
         The <code>model</code> specifies which VoyageAI model to use.
-        Available models include <code>voyage-3.5</code> (1024 dimensions, latest and best),
-        <code>voyage-3.5-lite</code> (512 dimensions, fastest),
-        <code>voyage-code-3</code> (optimized for code), and others.
-        See the <a href="https://docs.voyageai.com/docs/embeddings">VoyageAI documentation</a> for the full list.
     </li>
     <li>
         The <code>api-key-secret-ref</code> references a secret in Vespa's
@@ -526,89 +525,26 @@ <h3 id="voyageai-embedder">VoyageAI Embedder</h3>
     </li>
 </ul>
 
-<p>Add your VoyageAI API key to the secret store:</p>
-<pre>
-vespa secret add voyage_api_key --value "pa-xxxxx..."
-</pre>
-
-<p>See the <a href="reference/embedding-reference.html#voyageai-embedder-reference-config">reference</a>
-for all configuration parameters including caching, retry logic, and performance tuning.</p>
+<p>See the <a href="../reference/rag/embedding.html#voyageai-embedder-reference-config">reference</a>
+for all configuration parameters.</p>
 
 <h4 id="voyageai-embedder-models">VoyageAI embedder models</h4>
-<p>
-    VoyageAI offers several embedding models optimized for different use cases.
-    The resulting <a href="reference/tensor.html#tensor-type-spec">tensor type</a> can be <code>float</code> or
-    <code>bfloat16</code> for storage efficiency.
-</p>
-
-<p>Latest general-purpose models (recommended):</p>
+<p>For the complete list of available models and their specifications, see:</p>
 <ul>
-    <li><a href="https://docs.voyageai.com/docs/embeddings"><strong>voyage-3.5</strong></a> produces <code>tensor&lt;float&gt;(x[1024])</code> - <strong>latest and best quality</strong>, state-of-the-art for most applications</li>
-    <li><strong>voyage-3.5-lite</strong> produces <code>tensor&lt;float&gt;(x[512])</code> - <strong>newest lite model</strong>, excellent quality at lower cost and faster speed</li>
-</ul>
-
-<p>Previous generation general-purpose models:</p>
-<ul>
-    <li>voyage-3 produces <code>tensor&lt;float&gt;(x[1024])</code> - high quality (use voyage-3.5 for best results)</li>
-    <li>voyage-3-lite produces <code>tensor&lt;float&gt;(x[512])</code> - cost-efficient (use voyage-3.5-lite for better performance)</li>
-</ul>
-
-<p>Specialized models:</p>
-<ul>
-    <li>voyage-code-3 produces <code>tensor&lt;float&gt;(x[1024])</code> - optimized for code search and technical content</li>
-    <li>voyage-finance-2 produces <code>tensor&lt;float&gt;(x[1024])</code> - optimized for financial documents</li>
-    <li>voyage-law-2 produces <code>tensor&lt;float&gt;(x[1024])</code> - optimized for legal documents</li>
-    <li>voyage-multilingual-2 produces <code>tensor&lt;float&gt;(x[1024])</code> - supports 100+ languages</li>
-</ul>
-
-<p>Contextual model:</p>
-<ul>
-    <li><strong>voyage-context-3</strong> produces <code>tensor&lt;float&gt;(x[1024])</code> (configurable: 256, 512, 1024, 2048) -
-        contextualized embeddings for document chunks with surrounding context awareness</li>
-</ul>
-
-<p>Multimodal model (preview):</p>
-<ul>
-    <li><strong>voyage-multimodal-3.5</strong> produces <code>tensor&lt;float&gt;(x[1024])</code> (configurable: 256, 512, 1024, 2048) -
-        multimodal embeddings for text, images, and video in a shared vector space</li>
+    <li><a href="https://docs.voyageai.com/docs/embeddings">VoyageAI Embeddings Documentation</a> - General-purpose and specialized models</li>
+    <li><a href="https://docs.voyageai.com/docs/contextualized-chunk-embeddings">Contextualized Chunk Embeddings</a> - Models for embedding document chunks with surrounding context. See <a href="../rag/working-with-chunks.html">Working with chunks</a>.</li>
+    <li><a href="https://docs.voyageai.com/docs/multimodal-embeddings">Multimodal Embeddings</a> - Multimodal models for text, images, and video</li>
 </ul>
 
 <h4 id="voyageai-input-types">Input type detection</h4>
 <p>VoyageAI models distinguish between query and document embeddings for improved retrieval quality.
-The embedder automatically detects the context and sets the appropriate input type:</p>
-<ul>
-    <li><strong>Query context</strong>: When embedding text in query requests via <code>embed()</code></li>
-    <li><strong>Document context</strong>: When embedding document fields during indexing</li>
-</ul>
-
-<p>You can disable auto-detection and set a fixed input type:</p>
-<pre>{% highlight xml %}
-<component id="voyage" type="voyage-ai-embedder">
-    <model>voyage-3.5</model>
-    <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
-    <auto-detect-input-type>false</auto-detect-input-type>
-    <default-input-type>query</default-input-type>
-</component>
-{% endhighlight %}</pre>
+The embedder automatically detects the context and sets the appropriate input type based on whether
+the embedding is performed during feed (indexing) or query processing in Vespa.</p>
 
-<h4 id="voyageai-performance">VoyageAI performance features</h4>
-<p>The VoyageAI embedder includes several performance optimizations:</p>
-<ul>
-    <li><strong>Caching</strong>: Automatically caches recent embeddings to reduce API calls.</li>
-    <li><strong>Connection pooling</strong>: Reuses HTTP connections for efficiency. Configure with <code>max-idle-connections</code> (default: 5).</li>
-    <li><strong>Retry logic</strong>: Automatically retries on rate limits and transient errors with exponential backoff. Configure with <code>max-retries</code> (default: 10).</li>
-    <li><strong>Normalization</strong>: Optional L2 normalization for cosine similarity. Enable with <code>normalize</code> set to <code>true</code>.</li>
-</ul>
-
-<p>Example with performance tuning:</p>
-<pre>{% highlight xml %}
-<component id="voyage" type="voyage-ai-embedder">
-    <model>voyage-3.5</model>
-    <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
-    <max-idle-connections>20</max-idle-connections>
-    <normalize>true</normalize>
-</component>
-{% endhighlight %}</pre>
+<p>For advanced use cases where you need to control the input type programmatically,
+you can use the <code>destination</code> property of the
+<a href="https://javadoc.io/static/com.yahoo.vespa/linguistics/8.620.35/com/yahoo/language/process/Embedder.Context.html">Embedder.Context</a>
+when calling the embedder from Java code.</p>
 
 <h4 id="voyageai-usage-example">Usage example</h4>
 <p>Complete example showing document indexing and query-time embedding:</p>
@@ -625,7 +561,7 @@ <h4 id="voyageai-usage-example">Usage example</h4>
     field embedding type tensor&lt;float&gt;(x[1024]) {
         indexing: input text | embed voyage | attribute | index
         attribute {
-            distance-metric: angular
+            distance-metric: prenormalized-angular
         }
     }
 
@@ -647,10 +583,6 @@ <h4 id="voyageai-usage-example">Usage example</h4>
   'input.query(q)=embed(voyage, "machine learning tutorials")'
 {% endhighlight %}</pre>
 
-<p>When using <code>normalize</code> set to <code>true</code>, use
-<a href="reference/schema-reference.html#prenormalized-angular">distance-metric: prenormalized-angular</a>
-for more efficient similarity computation.</p>
-
 
 <h2 id="embedder-performance">Embedder performance</h2>
 
diff --git a/en/reference/rag/embedding.html b/en/reference/rag/embedding.html
index 3dcfdba557..efef9e6914 100644
--- a/en/reference/rag/embedding.html
+++ b/en/reference/rag/embedding.html
@@ -481,8 +481,7 @@ <h3 id="splade-embedder-reference-config">splade embedder reference config</h3>
 <h2 id="voyageai-embedder">VoyageAI Embedder</h2>
 <p>
   An embedder that uses the <a href="https://www.voyageai.com/">VoyageAI</a> API
-  to generate embeddings. This is an API-based embedder that does not require local model files or ONNX inference.
-  It calls the VoyageAI service to generate high-quality embeddings optimized for semantic search.
+  to generate embeddings.
 </p>
 <p>
   The VoyageAI embedder is configured in <a href="services.html">services.xml</a>,
@@ -491,25 +490,14 @@ <h2 id="voyageai-embedder">VoyageAI Embedder</h2>
 <pre>{% highlight xml %}
 <container id="default" version="1.0">
     <component id="voyage" type="voyage-ai-embedder">
-        <model>voyage-3.5</model>
+        <model>voyage-law-2</model>
         <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
+        <endpoint>https://api.voyageai.com/v1/embeddings</endpoint>
+        <truncate>true</truncate>
     </component>
 </container>
 {% endhighlight %}</pre>
 
-<h3 id="voyageai-secret-management">Secret Management</h3>
-<p>
-  The VoyageAI API key must be stored in Vespa's
-  <a href="/en/cloud/security/secret-store.html">secret store</a> for secure management:
-</p>
-<pre>
-vespa secret add voyage_api_key --value "pa-xxxxx..."
-</pre>
-<p>
-  The <code>api-key-secret-ref</code> parameter references the secret name.
-  Secrets are automatically refreshed when rotated without requiring application restart.
-</p>
-
 <h3 id="voyageai-embedder-reference-config">VoyageAI embedder reference config</h3>
 <table class="table">
   <thead>
@@ -525,29 +513,17 @@ <h3 id="voyageai-embedder-reference-config">VoyageAI embedder reference config</
     <tr>
       <td>api-key-secret-ref</td>
       <td>One</td>
-      <td><strong>Required</strong>. Reference to the secret in Vespa's secret store containing the VoyageAI API key.</td>
+      <td><strong>Required</strong>. Reference to the secret in Vespa's <a href="/en/cloud/security/secret-store.html">secret store</a> containing the VoyageAI API key.</td>
       <td>string</td>
       <td>N/A</td>
     </tr>
     <tr>
       <td>model</td>
       <td>One</td>
-      <td>The VoyageAI model to use. Available models:
-        <ul style="margin: 5px 0;">
-          <li><strong><code>voyage-3.5</code></strong> (1024 dims) - <strong>Latest and best quality</strong>, state-of-the-art (recommended)</li>
-          <li><strong><code>voyage-3.5-lite</code></strong> (512 dims) - <strong>Newest lite model</strong>, excellent quality at lower cost</li>
-          <li><code>voyage-3</code> (1024 dims) - Previous generation, high quality</li>
-          <li><code>voyage-3-lite</code> (512 dims) - Previous generation, cost-efficient</li>
-          <li><code>voyage-code-3</code> (1024 dims) - Code search optimization</li>
-          <li><code>voyage-finance-2</code> (1024 dims) - Financial documents</li>
-          <li><code>voyage-law-2</code> (1024 dims) - Legal documents</li>
-          <li><code>voyage-multilingual-2</code> (1024 dims) - Multilingual support</li>
-          <li><code>voyage-context-3</code> (1024 dims, configurable: 256/512/1024/2048) - Contextualized document chunk embeddings</li>
-          <li><code>voyage-multimodal-3.5</code> (1024 dims, configurable: 256/512/1024/2048) - Multimodal embeddings (text, images, video) [preview]</li>
-        </ul>
+      <td><strong>Required</strong>. The VoyageAI model to use. See the <a href="https://docs.voyageai.com/docs/embeddings">VoyageAI embeddings documentation</a> for the complete list of available models including general-purpose, specialized, <a href="https://docs.voyageai.com/docs/contextualized-chunk-embeddings">contextualized</a>, and <a href="https://docs.voyageai.com/docs/multimodal-embeddings">multimodal</a> models.
       </td>
       <td>string</td>
-      <td>voyage-3.5</td>
+      <td>N/A</td>
     </tr>
     <tr>
       <td>endpoint</td>
@@ -556,41 +532,6 @@ <h3 id="voyageai-embedder-reference-config">VoyageAI embedder reference config</
       <td>string</td>
       <td>https://api.voyageai.com/v1/embeddings</td>
     </tr>
-    <tr>
-      <td>timeout</td>
-      <td>One</td>
-      <td>Request timeout in milliseconds. Also serves as the bound for retry attempts - retries stop when total elapsed time would exceed this timeout. Minimum value: 1000ms.</td>
-      <td>numeric</td>
-      <td>30000</td>
-    </tr>
-    <tr>
-      <td>max-retries</td>
-      <td>One</td>
-      <td>Maximum number of retry attempts for failed requests. Used as a safety limit in addition to the timeout-based retry bound.</td>
-      <td>numeric</td>
-      <td>10</td>
-    </tr>
-    <tr>
-      <td>default-input-type</td>
-      <td>One</td>
-      <td>Default input type when auto-detection is disabled. Valid values: <code>query</code> or <code>document</code>. VoyageAI models use different optimizations for queries vs documents.</td>
-      <td>enum</td>
-      <td>document</td>
-    </tr>
-    <tr>
-      <td>auto-detect-input-type</td>
-      <td>One</td>
-      <td>Whether to automatically detect input type based on context. When enabled, uses <code>query</code> type for query-time embeddings and <code>document</code> type for indexing.</td>
-      <td>boolean</td>
-      <td>true</td>
-    </tr>
-    <tr>
-      <td>normalize</td>
-      <td>One</td>
-      <td>Whether to apply L2 normalization to embeddings. When enabled, all embedding vectors are normalized to unit length. Use with <code>prenormalized-angular</code> <a href="schema-reference.html#distance-metric">distance-metric</a> for efficient similarity computation.</td>
-      <td>boolean</td>
-      <td>false</td>
-    </tr>
     <tr>
       <td>truncate</td>
       <td>One</td>
@@ -598,79 +539,9 @@ <h3 id="voyageai-embedder-reference-config">VoyageAI embedder reference config</
       <td>boolean</td>
       <td>true</td>
     </tr>
-    <tr>
-      <td>max-idle-connections</td>
-      <td>One</td>
-      <td>Maximum number of idle HTTP connections to keep in the connection pool. Higher values improve throughput for concurrent requests but use more resources.</td>
-      <td>numeric</td>
-      <td>5</td>
-    </tr>
   </tbody>
 </table>
 
-<h3 id="voyageai-example-configurations">Example Configurations</h3>
-
-<p><strong>Basic configuration (recommended)</strong>:</p>
-<pre>{% highlight xml %}
-<component id="voyage" type="voyage-ai-embedder">
-    <model>voyage-3.5</model>
-    <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
-</component>
-{% endhighlight %}</pre>
-
-<p><strong>High-performance configuration</strong>:</p>
-<pre>{% highlight xml %}
-<component id="voyage" type="voyage-ai-embedder">
-    <model>voyage-3.5</model>
-    <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
-    <max-idle-connections>20</max-idle-connections>
-    <timeout>60000</timeout>
-</component>
-{% endhighlight %}</pre>
-
-<p><strong>Fast and cost-efficient configuration</strong>:</p>
-<pre>{% highlight xml %}
-<component id="voyage-lite" type="voyage-ai-embedder">
-    <model>voyage-3.5-lite</model>
-    <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
-</component>
-{% endhighlight %}</pre>
-
-<p><strong>Query-optimized configuration</strong>:</p>
-<pre>{% highlight xml %}
-<component id="voyage-query" type="voyage-ai-embedder">
-    <model>voyage-3.5</model>
-    <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
-    <default-input-type>query</default-input-type>
-    <auto-detect-input-type>false</auto-detect-input-type>
-    <normalize>true</normalize>
-</component>
-{% endhighlight %}</pre>
-
-<p><strong>Code search configuration</strong>:</p>
-<pre>{% highlight xml %}
-<component id="code-embedder" type="voyage-ai-embedder">
-    <model>voyage-code-3</model>
-    <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
-    <normalize>true</normalize>
-</component>
-{% endhighlight %}</pre>
-
-<h3 id="voyageai-cost-optimization">Cost and Performance Optimization</h3>
-<p>The VoyageAI embedder includes several features to reduce API costs and improve performance:</p>
-<ul>
-  <li><strong>Caching</strong>: The embedder automatically caches recent embeddings to prevent duplicate API calls for the same text. This is particularly effective for repeated queries or documents.</li>
-  <li><strong>Connection pooling</strong>: HTTP connections are reused to reduce connection overhead and improve throughput. Configure with <code>max-idle-connections</code> (default: 5).</li>
-  <li><strong>Retry logic</strong>: Automatic retries with exponential backoff handle rate limits and transient errors, bounded by the global timeout. Configure with <code>max-retries</code> (default: 10).</li>
-  <li><strong>Model selection</strong>: Use <code>voyage-3.5-lite</code> for cost-sensitive applications (512 dimensions vs 1024 dimensions reduces costs while maintaining excellent quality). For best quality, use <code>voyage-3.5</code>.</li>
-</ul>
-
-<p>For detailed performance monitoring, the embedder emits standard Vespa embedder metrics
-  (see <a href="container-metrics-reference.html">Container Metrics</a>).
-  Monitor API usage and costs through the <a href="https://www.voyageai.com/">VoyageAI dashboard</a>.</p>
-
-
-
 <h2 id="huggingface-tokenizer-embedder">Huggingface tokenizer embedder</h2>
   <p>
     The Huggingface tokenizer embedder is configured in <a href="../applications/services/services.html">services.xml</a>,

From 32cc54aff55e25628a1839f1f09199f043e9b24b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= <bjorn.christian@seime.no>
Date: Thu, 22 Jan 2026 10:42:55 +0100
Subject: [PATCH 04/10] chore(voyage-ai-embedder): add section on best
 practices

---
 en/rag/embedding.html | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/en/rag/embedding.html b/en/rag/embedding.html
index 70a344b4a8..732ec30b0b 100644
--- a/en/rag/embedding.html
+++ b/en/rag/embedding.html
@@ -583,6 +583,41 @@ <h4 id="voyageai-usage-example">Usage example</h4>
   'input.query(q)=embed(voyage, "machine learning tutorials")'
 {% endhighlight %}</pre>
 
+<h4 id="voyageai-best-practices">Best practices</h4>
+<p>For production deployments, we recommend configuring <strong>separate embedder components for feed and search operations</strong>.
+This architectural pattern provides two key benefits - cost optimization and rate limit isolation.
+In Vespa Cloud, it's best practice to configure these embedders in separate container clusters for feed and search.</p>
+
+<pre>{% highlight xml %}
+<container id="feed" version="1.0">
+    <component id="voyage" type="voyage-ai-embedder">
+        <model>voyage-4-large</model>
+        <api-key-secret-ref>voyage_feed_api_key</api-key-secret-ref>
+    </component>
+    <document-api/>
+</container>
+
+<container id="search" version="1.0">
+    <component id="voyage" type="voyage-ai-embedder">
+        <model>voyage-4-lite</model>
+        <api-key-secret-ref>voyage_search_api_key</api-key-secret-ref>
+    </component>
+    <search/>
+</container>
+{% endhighlight %}</pre>
+
+<h5 id="voyageai-cost-optimization">Cost optimization with model variants</h5>
+<p>The <a href="https://blog.voyageai.com/2026/01/15/voyage-4/">Voyage 4 model family</a> features a shared embedding space
+across different model sizes. This enables a cost-effective strategy where you can use a more powerful (and expensive) model
+for document embeddings, while using a smaller, cheaper model for query embeddings.
+Since document embedding happens once during indexing but query embedding occurs on every search request,
+this approach can significantly reduce operational costs while maintaining quality.</p>
+
+<h5 id="voyageai-rate-limit-isolation">Rate limit isolation</h5>
+<p>Separating feed and search operations is particularly important for managing VoyageAI API rate limits.
+Bursty document feeding operations can consume significant API quota, potentially causing rate limit errors
+that affect search queries. By using <strong>separate API keys</strong> for feed and search embedders,
+you ensure that feeding bursts don't negatively impact search.</p>
 
 <h2 id="embedder-performance">Embedder performance</h2>
 

From 2ca355dfcc097117d95dc3d24aba0bb4191fcf8d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= <bjorn.christian@seime.no>
Date: Thu, 22 Jan 2026 11:10:36 +0100
Subject: [PATCH 05/10] chore: simplify - details already covered in the
 article

---
 en/rag/embedding.html | 37 -------------------------------------
 1 file changed, 37 deletions(-)

diff --git a/en/rag/embedding.html b/en/rag/embedding.html
index 732ec30b0b..49ca76d6d1 100644
--- a/en/rag/embedding.html
+++ b/en/rag/embedding.html
@@ -546,43 +546,6 @@ <h4 id="voyageai-input-types">Input type detection</h4>
 <a href="https://javadoc.io/static/com.yahoo.vespa/linguistics/8.620.35/com/yahoo/language/process/Embedder.Context.html">Embedder.Context</a>
 when calling the embedder from Java code.</p>
 
-<h4 id="voyageai-usage-example">Usage example</h4>
-<p>Complete example showing document indexing and query-time embedding:</p>
-
-<p><strong>Schema definition</strong>:</p>
-<pre>
-schema doc {
-    document doc {
-        field text type string {
-            indexing: summary | index
-        }
-    }
-
-    field embedding type tensor&lt;float&gt;(x[1024]) {
-        indexing: input text | embed voyage | attribute | index
-        attribute {
-            distance-metric: prenormalized-angular
-        }
-    }
-
-    rank-profile semantic {
-        inputs {
-            query(q) tensor&lt;float&gt;(x[1024])
-        }
-        first-phase {
-            expression: closeness(field, embedding)
-        }
-    }
-}
-</pre>
-
-<p><strong>Query with embedding</strong>:</p>
-<pre>{% highlight bash %}
-vespa query \
-  'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding,q)' \
-  'input.query(q)=embed(voyage, "machine learning tutorials")'
-{% endhighlight %}</pre>
-
 <h4 id="voyageai-best-practices">Best practices</h4>
 <p>For production deployments, we recommend configuring <strong>separate embedder components for feed and search operations</strong>.
 This architectural pattern provides two key benefits - cost optimization and rate limit isolation.

From 0328f897d858d5cdc759113b2104aaddf8434ede Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= <bjorn.christian@seime.no>
Date: Mon, 26 Jan 2026 16:10:11 +0100
Subject: [PATCH 06/10] chore: document 'quantization' and 'dimensions'

---
 en/rag/embedding.html           |  3 +++
 en/reference/rag/embedding.html | 25 ++++++++++++++++++++-----
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/en/rag/embedding.html b/en/rag/embedding.html
index 49ca76d6d1..4b3bb0ff4a 100644
--- a/en/rag/embedding.html
+++ b/en/rag/embedding.html
@@ -510,6 +510,7 @@ <h3 id="voyageai-embedder">VoyageAI Embedder</h3>
     <component id="voyage" type="voyage-ai-embedder">
         <model>voyage-4</model>
         <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
+        <dimensions>1024</dimensions>
     </component>
 </container>
 {% endhighlight %}</pre>
@@ -555,6 +556,7 @@ <h4 id="voyageai-best-practices">Best practices</h4>
 <container id="feed" version="1.0">
     <component id="voyage" type="voyage-ai-embedder">
         <model>voyage-4-large</model>
+        <dimensions>1024</dimensions>
         <api-key-secret-ref>voyage_feed_api_key</api-key-secret-ref>
     </component>
     <document-api/>
@@ -563,6 +565,7 @@ <h4 id="voyageai-best-practices">Best practices</h4>
 <container id="search" version="1.0">
     <component id="voyage" type="voyage-ai-embedder">
         <model>voyage-4-lite</model>
+        <dimensions>1024</dimensions>
         <api-key-secret-ref>voyage_search_api_key</api-key-secret-ref>
     </component>
     <search/>
diff --git a/en/reference/rag/embedding.html b/en/reference/rag/embedding.html
index efef9e6914..67aebcaefa 100644
--- a/en/reference/rag/embedding.html
+++ b/en/reference/rag/embedding.html
@@ -492,6 +492,7 @@ <h2 id="voyageai-embedder">VoyageAI Embedder</h2>
     <component id="voyage" type="voyage-ai-embedder">
         <model>voyage-law-2</model>
         <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
+        <dimensions>1024</dimensions>
         <endpoint>https://api.voyageai.com/v1/embeddings</endpoint>
         <truncate>true</truncate>
     </component>
@@ -511,17 +512,24 @@ <h3 id="voyageai-embedder-reference-config">VoyageAI embedder reference config</
   </thead>
   <tbody>
     <tr>
-      <td>api-key-secret-ref</td>
+      <td>model</td>
       <td>One</td>
-      <td><strong>Required</strong>. Reference to the secret in Vespa's <a href="/en/cloud/security/secret-store.html">secret store</a> containing the VoyageAI API key.</td>
+      <td><strong>Required</strong>. The VoyageAI model to use. See the <a href="https://docs.voyageai.com/docs/embeddings">VoyageAI embeddings documentation</a> for the complete list of available models including general-purpose, specialized, <a href="https://docs.voyageai.com/docs/contextualized-chunk-embeddings">contextualized</a>, and <a href="https://docs.voyageai.com/docs/multimodal-embeddings">multimodal</a> models.
+      </td>
       <td>string</td>
       <td>N/A</td>
     </tr>
     <tr>
-      <td>model</td>
+      <td>dimensions</td>
       <td>One</td>
-      <td><strong>Required</strong>. The VoyageAI model to use. See the <a href="https://docs.voyageai.com/docs/embeddings">VoyageAI embeddings documentation</a> for the complete list of available models including general-purpose, specialized, <a href="https://docs.voyageai.com/docs/contextualized-chunk-embeddings">contextualized</a>, and <a href="https://docs.voyageai.com/docs/multimodal-embeddings">multimodal</a> models.
-      </td>
+      <td><strong>Required</strong>. The number of dimensions for the output embedding vectors. Must match the tensor field definition in your schema. Valid values are <code>256</code>, <code>512</code>, <code>1024</code>, <code>1536</code>, or <code>2048</code>. See the <a href="https://docs.voyageai.com/docs/embeddings">VoyageAI embeddings documentation</a> for model-specific dimension support.</td>
+      <td>integer</td>
+      <td>N/A</td>
+    </tr>
+    <tr>
+      <td>api-key-secret-ref</td>
+      <td>One</td>
+      <td><strong>Required</strong>. Reference to the secret in Vespa's <a href="/en/cloud/security/secret-store.html">secret store</a> containing the VoyageAI API key.</td>
       <td>string</td>
       <td>N/A</td>
     </tr>
@@ -539,6 +547,13 @@ <h3 id="voyageai-embedder-reference-config">VoyageAI embedder reference config</
       <td>boolean</td>
       <td>true</td>
     </tr>
+    <tr>
+      <td>quantization</td>
+      <td>Optional</td>
+      <td>Output quantization format for embedding vectors. Valid values are <code>auto</code>, <code>float</code>, <code>int8</code>, or <code>binary</code>. When set to <code>auto</code>, the embedder infers the appropriate quantization from the dimensions and cell type of the destination tensor in your schema. See the <a href="https://docs.voyageai.com/docs/flexible-dimensions-and-quantization#quantization">VoyageAI quantization documentation</a> for details on quantization options and their trade-offs.</td>
+      <td>string</td>
+      <td>auto</td>
+    </tr>
   </tbody>
 </table>
 

From 9a7c8ceec5e6513335c803744c111432f7d496a8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= <bjorn.christian@seime.no>
Date: Tue, 27 Jan 2026 08:42:14 +0100
Subject: [PATCH 07/10] chore: document use case for contextualized chunk
 embedding models

---
 en/rag/embedding.html | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/en/rag/embedding.html b/en/rag/embedding.html
index 4b3bb0ff4a..2d0eab40be 100644
--- a/en/rag/embedding.html
+++ b/en/rag/embedding.html
@@ -537,6 +537,34 @@ <h4 id="voyageai-embedder-models">VoyageAI embedder models</h4>
     <li><a href="https://docs.voyageai.com/docs/multimodal-embeddings">Multimodal Embeddings</a> - Multimodal models for text, images, and video</li>
 </ul>
 
+<h4 id="voyageai-contextualized-embeddings">Contextualized chunk embeddings</h4>
+<p>To use <a href="https://docs.voyageai.com/docs/contextualized-chunk-embeddings">contextualized chunk embeddings</a>,
+configure the VoyageAI embedder with a <code>voyage-context-*</code> model and use it to embed an
+<code>array&lt;string&gt;</code> field containing your document chunks:</p>
+
+<pre>
+schema doc {
+    document doc {
+        field chunks type array&lt;string&gt; {
+            indexing: index | summary
+        }
+    }
+    field embeddings type tensor&lt;float&gt;(chunk{}, x[1024]) {
+        indexing: input chunks | embed voyage | attribute | index
+        attribute {
+            distance-metric: prenormalized-angular
+        }
+    }
+}
+</pre>
+
+<p>
+    When embedding array fields with a contextualized chunk embedding model, Vespa sends all chunks from a document in a single API request,
+    allowing Voyage to encode each chunk with context from the other chunks.
+    Be aware that the combined size of all chunks in a document must fit within the VoyageAI API's input token limit.
+    See <a href="working-with-chunks.html">Working with chunks</a> for chunking strategies.
+</p>
+
 <h4 id="voyageai-input-types">Input type detection</h4>
 <p>VoyageAI models distinguish between query and document embeddings for improved retrieval quality.
 The embedder automatically detects the context and sets the appropriate input type based on whether

From 80293f6db216ab4fccdc0c86f4974c69cdb4caea Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= <bjorn.christian@seime.no>
Date: Tue, 27 Jan 2026 09:02:52 +0100
Subject: [PATCH 08/10] chore: update recommendation for self-managed secrets
 for OpenAI integration

---
 en/rag/external-llms.md | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/en/rag/external-llms.md b/en/rag/external-llms.md
index 29099a57cb..9a52de67c6 100644
--- a/en/rag/external-llms.md
+++ b/en/rag/external-llms.md
@@ -59,11 +59,15 @@ This sets up a client component that can be used in a
 
 Vespa provides several options to configure the API key used by the client.
 
-1. Using the [Vespa Cloud secret store](../security/secret-store) to store the API key. 
-2. This is done by setting the `apiKeySecretRef` configuration parameter to the name of the secret 
-3. in the secret store. This is the recommended way for Vespa Cloud users.
-2. Providing the API key in the `X-LLM-API-KEY` HTTP header of the Vespa query. 
-3. It is also possible to configure the API key in a custom component. For example, [this](https://github.com/vespa-engine/system-test/tree/master/tests/docproc/generate_field_openai) system-test shows how to retrieve the API key from a local file deployed with your Vespa application. Please note that this is NOT recommended for production use, as it is less secure than using the secret store, but it can be modified to suit your needs.
+1. Using the [Vespa Cloud secret store](../security/secret-store) to store the API key.
+   This is done by setting the `apiKeySecretRef` configuration parameter to the name of the secret
+   in the secret store. This is the recommended way for Vespa Cloud users.
+2. For self-managed Vespa, you can provide secrets via environment variables.
+   Set the `apiKeySecretRef` configuration parameter and expose the secret as an environment variable
+   named `VESPA_SECRET_<SECRET_REF>`, where `<SECRET_REF>` is the secret reference name converted to
+   upper snake case. For example, if `apiKeySecretRef` is set to `myApiKey`, the environment variable
+   should be named `VESPA_SECRET_MY_API_KEY`.
+3. Providing the API key in the `X-LLM-API-KEY` HTTP header of the Vespa query.
 
 You can set up multiple connections with different settings. For instance, you
 might want to run different LLMs for different tasks. To distinguish between the

From 475f41cb46c78778b104bdd780f43ecca256875f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= <bjorn.christian@seime.no>
Date: Tue, 27 Jan 2026 10:37:08 +0100
Subject: [PATCH 09/10] chore: improve description of quantization parameter

---
 en/reference/rag/embedding.html | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/en/reference/rag/embedding.html b/en/reference/rag/embedding.html
index 67aebcaefa..7d46ab4664 100644
--- a/en/reference/rag/embedding.html
+++ b/en/reference/rag/embedding.html
@@ -535,14 +535,14 @@ <h3 id="voyageai-embedder-reference-config">VoyageAI embedder reference config</
     </tr>
     <tr>
       <td>endpoint</td>
-      <td>One</td>
-      <td>VoyageAI API endpoint URL. Can be overridden for custom proxies or regional endpoints.</td>
+      <td>Optional</td>
+      <td>VoyageAI API endpoint URL.</td>
       <td>string</td>
       <td>https://api.voyageai.com/v1/embeddings</td>
     </tr>
     <tr>
       <td>truncate</td>
-      <td>One</td>
+      <td>Optional</td>
       <td>Whether to truncate input text exceeding model limits. When enabled, text is automatically truncated. When disabled, requests with too-long text will fail.</td>
       <td>boolean</td>
       <td>true</td>
@@ -550,7 +550,13 @@ <h3 id="voyageai-embedder-reference-config">VoyageAI embedder reference config</
     <tr>
       <td>quantization</td>
       <td>Optional</td>
-      <td>Output quantization format for embedding vectors. Valid values are <code>auto</code>, <code>float</code>, <code>int8</code>, or <code>binary</code>. When set to <code>auto</code>, the embedder infers the appropriate quantization from the dimensions and cell type of the destination tensor in your schema. See the <a href="https://docs.voyageai.com/docs/flexible-dimensions-and-quantization#quantization">VoyageAI quantization documentation</a> for details on quantization options and their trade-offs.</td>
+      <td>Output quantization format for embedding vectors. Valid values are <code>auto</code>, <code>float</code>,
+        <code>int8</code>, or <code>binary</code>. When set to <code>auto</code>,
+        the embedder infers the appropriate quantization from the dimensions and cell type of the destination tensor in your schema.
+        When using <code>binary</code> quantization, the destination tensor field must use <code>int8</code> cell type
+        with 1/8 of the dimensions specified in the embedder configuration (e.g., 1024 dimensions → <code>tensor&lt;int8&gt;(x[128])</code>).
+        See the <a href="https://docs.voyageai.com/docs/flexible-dimensions-and-quantization#quantization">VoyageAI quantization documentation</a>
+        for details on quantization options and <a href="../../rag/binarizing-vectors.html">binarizing vectors</a> for more on binary quantization in Vespa.</td>
       <td>string</td>
       <td>auto</td>
     </tr>

From 324162eb837a075d31bbac310f12dcdf755448f2 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B8rn=20Christian=20Seime?= <bjorn.christian@seime.no>
Date: Tue, 27 Jan 2026 11:02:03 +0100
Subject: [PATCH 10/10] chore: add badge on minimum version required

---
 en/rag/embedding.html | 1 +
 1 file changed, 1 insertion(+)

diff --git a/en/rag/embedding.html b/en/rag/embedding.html
index 2d0eab40be..f3a4cef3d3 100644
--- a/en/rag/embedding.html
+++ b/en/rag/embedding.html
@@ -538,6 +538,7 @@ <h4 id="voyageai-embedder-models">VoyageAI embedder models</h4>
 </ul>
 
 <h4 id="voyageai-contextualized-embeddings">Contextualized chunk embeddings</h4>
+{% include note.html content='Available since 8.637.' %}
 <p>To use <a href="https://docs.voyageai.com/docs/contextualized-chunk-embeddings">contextualized chunk embeddings</a>,
 configure the VoyageAI embedder with a <code>voyage-context-*</code> model and use it to embed an
 <code>array&lt;string&gt;</code> field containing your document chunks:</p>

Name	Occurrence	Description	Type	Default
api-key-secret-ref	One	Required. Reference to the secret in Vespa's secret store containing the VoyageAI API key.	string	N/A
model	One	The VoyageAI model to use. Available models: + + `voyage-3.5` (1024 dims) - Latest and best quality, state-of-the-art (recommended) + `voyage-3.5-lite` (512 dims) - Newest lite model, excellent quality at lower cost + `voyage-3` (1024 dims) - Previous generation, high quality + `voyage-3-lite` (512 dims) - Previous generation, cost-efficient + `voyage-code-3` (1024 dims) - Code search optimization + `voyage-finance-2` (1024 dims) - Financial documents + `voyage-law-2` (1024 dims) - Legal documents + `voyage-multilingual-2` (1024 dims) - Multilingual support + `voyage-context-3` (1024 dims, configurable: 256/512/1024/2048) - Contextualized document chunk embeddings + `voyage-multimodal-3.5` (1024 dims, configurable: 256/512/1024/2048) - Multimodal embeddings (text, images, video) [preview] + +	string	voyage-3.5
endpoint	One	VoyageAI API endpoint URL. Can be overridden for custom proxies or regional endpoints.	string	https://api.voyageai.com/v1/embeddings
timeout	One	Request timeout in milliseconds. Also serves as the bound for retry attempts - retries stop when total elapsed time would exceed this timeout. Minimum value: 1000ms.	numeric	30000
max-retries	One	Maximum number of retry attempts for failed requests. Used as a safety limit in addition to the timeout-based retry bound.	numeric	10
default-input-type	One	Default input type when auto-detection is disabled. Valid values: `query` or `document`. VoyageAI models use different optimizations for queries vs documents.	enum	document
auto-detect-input-type	One	Whether to automatically detect input type based on context. When enabled, uses `query` type for query-time embeddings and `document` type for indexing.	boolean	true
normalize	One	Whether to apply L2 normalization to embeddings. When enabled, all embedding vectors are normalized to unit length. Use with `prenormalized-angular` distance-metric for efficient similarity computation.	boolean	false
truncate	One	Whether to truncate input text exceeding model limits. When enabled, text is automatically truncated. When disabled, requests with too-long text will fail.	boolean	true
max-idle-connections	One	Maximum number of idle HTTP connections to keep in the connection pool. Higher values improve throughput for concurrent requests but use more resources.	numeric	5