```mermaid
diff --git a/serverless/workers/overview.mdx b/serverless/workers/overview.mdx
index 2ef0f9fa..afd38dc3 100644
--- a/serverless/workers/overview.mdx
+++ b/serverless/workers/overview.mdx
@@ -3,6 +3,8 @@ title: "Overview"
description: "Package your handler function for deployment."
---
+import { ServerlessEnvironmentVariablesTooltip, MachineTooltip } from "/snippets/tooltips.jsx";
+
Workers are the containerized environments that run your code on Runpod Serverless. After creating and testing your [handler function](/serverless/workers/handler-functions), you need to package it into a Docker image and deploy it to an endpoint.
This page provides an overview of the worker deployment process.
@@ -71,7 +73,7 @@ Workers move through different states as they handle requests and respond to cha
* **Initializing**: The worker starts up while the system downloads and prepares the Docker image. The container starts and loads your code.
* **Idle**: The worker is ready but not processing requests. No charges apply while idle.
* **Running**: The worker actively processes requests. Billing occurs per second.
-* **Throttled**: The worker is ready but temporarily unable to run due to host machine resource constraints.
+* **Throttled**: The worker is ready but temporarily unable to run due to host
resource constraints.
* **Outdated**: The system marks the worker for replacement after endpoint updates. It continues processing current jobs during rolling updates (10% of max workers at a time).
* **Unhealthy**: The worker has crashed due to Docker image issues, incorrect start commands, or machine problems. The system automatically retries with exponential backoff for up to 7 days.
diff --git a/snippets/tooltips.jsx b/snippets/tooltips.jsx
index c2e729e3..fe75a8a4 100644
--- a/snippets/tooltips.jsx
+++ b/snippets/tooltips.jsx
@@ -95,11 +95,6 @@ export const NetworkVolumeTooltip = () => {
);
};
-export const ContainerVolumeTooltip = () => {
- return (
-
container volume
- );
-};
export const VolumeDiskTooltip = () => {
return (
@@ -123,7 +118,7 @@ export const RunpodHubTooltip = () => {
export const PublicEndpointTooltip = () => {
return (
-
Public Endpoint
+
Public Endpoint
);
};
@@ -143,19 +138,26 @@ export const RunpodCLITooltip = () => {
export const ContainerTooltip = () => {
return (
-
container
+
container
);
};
export const DataCenterTooltip = () => {
return (
-
data center
+
data center
);
};
export const MachineTooltip = () => {
return (
-
machine
+
machine
+ );
+};
+
+export const MachinesTooltip = () => {
+ return (
+
machines
);
};
From a79fb785f238c7b1deff86208342f196e40acf97 Mon Sep 17 00:00:00 2001
From: Mo King
Date: Thu, 5 Feb 2026 17:30:26 -0500
Subject: [PATCH 4/6] Use mintlify tree component
---
fine-tune.mdx | 14 +++++--
pods/manage-pods.mdx | 4 +-
pods/templates/create-custom-template.mdx | 13 ++++---
serverless/development/dual-mode-worker.mdx | 16 +++++---
serverless/endpoints/job-states.mdx | 6 ++-
serverless/endpoints/model-caching.mdx | 37 ++++++++++--------
serverless/endpoints/overview.mdx | 2 +
serverless/endpoints/send-requests.mdx | 6 ++-
serverless/load-balancing/overview.mdx | 6 ++-
serverless/load-balancing/vllm-worker.mdx | 20 +++++-----
serverless/overview.mdx | 8 ++--
serverless/workers/create-dockerfile.mdx | 27 ++++++++-----
serverless/workers/handler-functions.mdx | 8 ++--
snippets/tooltips.jsx | 42 +++++++++++++++++++++
tutorials/serverless/model-caching-text.mdx | 27 ++++++++-----
15 files changed, 162 insertions(+), 74 deletions(-)
diff --git a/fine-tune.mdx b/fine-tune.mdx
index 50045335..4bff8bb0 100644
--- a/fine-tune.mdx
+++ b/fine-tune.mdx
@@ -88,9 +88,17 @@ For a list of working configuration examples, check out the [Axolotl examples re
Your training environment is located in the `/workspace/fine-tuning/` directory and has the following structure:
-* `examples/`: Sample configurations and scripts.
-* `outputs/`: Where your training results and model outputs will be saved.
-* `config.yaml`: The main configuration file for your training parameters.
+
+
+
+
+
+
+
+
+
+
+`/examples/` contains sample configurations and scripts, `/outputs/` contains your training results and model outputs, and `/config.yaml/` is the main configuration file for your training parameters.
The system generates an initial `config.yaml` based on your selected base model and dataset. This is where you define all the hyperparameters for your fine-tuning job. You may need to experiment with these settings to achieve the best results.
diff --git a/pods/manage-pods.mdx b/pods/manage-pods.mdx
index 591833c3..048e5420 100644
--- a/pods/manage-pods.mdx
+++ b/pods/manage-pods.mdx
@@ -261,11 +261,11 @@ pod "wu5ekmn69oh1xr" started with $0.290 / hr
## Terminate a Pod
-
+
Terminating a Pod permanently deletes all associated data that isn't stored in a [network volume](/storage/network-volumes). Be sure to export or download any data that you'll need to access again.
-
+
diff --git a/pods/templates/create-custom-template.mdx b/pods/templates/create-custom-template.mdx
index c5ba4872..efdca25c 100644
--- a/pods/templates/create-custom-template.mdx
+++ b/pods/templates/create-custom-template.mdx
@@ -59,12 +59,13 @@ touch Dockerfile requirements.txt main.py
Your project structure should now look like this:
-```
-my-custom-pod-template/
-├── Dockerfile
-├── requirements.txt
-└── main.py
-```
+
+
+
+
+
+
+
diff --git a/serverless/development/dual-mode-worker.mdx b/serverless/development/dual-mode-worker.mdx
index 4b049c70..86579312 100644
--- a/serverless/development/dual-mode-worker.mdx
+++ b/serverless/development/dual-mode-worker.mdx
@@ -37,12 +37,16 @@ cd dual-mode-worker
touch handler.py start.sh Dockerfile requirements.txt
```
-This creates:
-
-- `handler.py`: Your Python script with the Runpod handler logic.
-- `start.sh`: A shell script that will be the entrypoint for your Docker container.
-- `Dockerfile`: Instructions to build your Docker image.
-- `requirements.txt`: A file to list Python dependencies.
+This creates the following project structure:
+
+
+
+
+
+
+
+
+
## Step 2: Create the handler
diff --git a/serverless/endpoints/job-states.mdx b/serverless/endpoints/job-states.mdx
index c0c0e499..4e607fa3 100644
--- a/serverless/endpoints/job-states.mdx
+++ b/serverless/endpoints/job-states.mdx
@@ -3,11 +3,13 @@ title: "Job states and metrics"
description: "Monitor your endpoints effectively by understanding job states and key metrics."
---
-Understanding job states and metrics is essential for effectively managing your Serverless endpoints. This documentation covers the different states your jobs can be in and the key metrics available to monitor endpoint performance and health.
+import { JobTooltip, RequestsTooltip, WorkerTooltip } from "/snippets/tooltips.jsx";
+
+Understanding states and metrics is essential for effectively managing your Serverless endpoints. This documentation covers the different states your jobs can be in and the key metrics available to monitor endpoint performance and health.
## Request job states
-Understanding job states helps you track the progress of individual requests and identify where potential issues might occur in your workflow.
+Understanding job states helps you track the progress of individual and identify where potential issues might occur in your workflow.
* `IN_QUEUE`: The job is waiting in the endpoint queue for an available worker to process it.
* `RUNNING`: A worker has picked up the job and is actively processing it.
diff --git a/serverless/endpoints/model-caching.mdx b/serverless/endpoints/model-caching.mdx
index 2cf9f230..5c11fe2b 100644
--- a/serverless/endpoints/model-caching.mdx
+++ b/serverless/endpoints/model-caching.mdx
@@ -5,9 +5,9 @@ description: "Accelerate worker cold starts and reduce costs by using cached mod
tag: "NEW"
---
-import { MachineTooltip, MachinesTooltip } from "/snippets/tooltips.jsx";
+import { MachineTooltip, MachinesTooltip, ColdStartTooltip, WorkersTooltip, HandlerFunctionTooltip } from "/snippets/tooltips.jsx";
-Enabling cached models for your workers can reduce [cold start times](/serverless/overview#cold-starts) to just a few seconds and dramatically reduce the cost for loading large models.
+Enabling cached models on your endpoints can reduce times and dramatically reduce the cost for loading large models.
## Why use cached models?
@@ -15,7 +15,7 @@ Enabling cached models for your workers can reduce [cold start times](/serverles
- **Reduced costs:** You aren't billed for worker time while your model is being downloaded. This is especially impactful for large models that can take several minutes to load.
- **Accelerated deployment:** You can deploy cached models instantly without waiting for external downloads or transfers.
- **Smaller container images:** By decoupling models from your container image, you can create smaller, more focused images that contain only your application logic.
-- **Shared across workers:** Multiple workers running on the same host can reference the same cached model, eliminating redundant downloads and saving disk space.
+- **Shared across workers:** Multiple running on the same host can reference the same cached model, eliminating redundant downloads and saving disk space.
## Cached model compatibility
@@ -120,21 +120,28 @@ Cached models are available to your workers at `/runpod-volume/huggingface-cache
While cached models use the same mount path as network volumes (`/runpod-volume/`), the model loaded from the cache will load significantly faster than the same model loaded from a network volume.
-The path structure follows this pattern:
-
-```
-/runpod-volume/huggingface-cache/hub/models--HF_ORGANIZATION--MODEL_NAME/snapshots/VERSION_HASH/
-```
-
-For example, the model `gensyn/qwen2.5-0.5b-instruct` would be stored at:
-
-```
-/runpod-volume/huggingface-cache/hub/models--gensyn--qwen2.5-0.5b-instruct/snapshots/317b7eb96312eda0c431d1dab1af958a308cb35e/
-```
+For example, here is how the model `gensyn/qwen2.5-0.5b-instruct` would be stored:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
### Programmatically locate cached models
-To dynamically locate cached models without hardcoding paths, you can add this helper function to your [handler file](/serverless/workers/handler-functions) to scan the cache directory for the model you want to use:
+To dynamically locate cached models without hardcoding paths, you can add this helper function to your to scan the cache directory for the model you want to use:
```python handler.py
import os
diff --git a/serverless/endpoints/overview.mdx b/serverless/endpoints/overview.mdx
index 7e624268..72805623 100644
--- a/serverless/endpoints/overview.mdx
+++ b/serverless/endpoints/overview.mdx
@@ -4,6 +4,8 @@ sidebarTitle: "Overview"
description: "Deploy and manage Serverless endpoints using the Runpod console or REST API."
---
+import { QueueBasedEndpointsTooltip, LoadBalancingEndpointsTooltip, ServerlessEnvironmentVariablesTooltip } from "/snippets/tooltips.jsx";
+
Endpoints are the foundation of Runpod Serverless, serving as the gateway for deploying and managing your [Serverless workers](/serverless/workers/overview). They provide a consistent API interface that allows your applications to interact with powerful compute resources on demand.
Endpoints are RESTful APIs that accept [HTTP requests](/serverless/endpoints/send-requests), processing the input using your [handler function](/serverless/workers/handler-functions), and returning the result via HTTP response. Each endpoint provides a unique URL and abstracts away the complexity of managing individual GPUs/CPUs.
diff --git a/serverless/endpoints/send-requests.mdx b/serverless/endpoints/send-requests.mdx
index 870ae206..85c1f698 100644
--- a/serverless/endpoints/send-requests.mdx
+++ b/serverless/endpoints/send-requests.mdx
@@ -4,11 +4,13 @@ sidebarTitle: "Send API requests"
description: "Submit and manage jobs for your queue-based endpoints by sending HTTP requests."
---
+import { JobTooltip, JobsTooltip, RequestsTooltip, WorkersTooltip, HandlerFunctionTooltip, QueueBasedEndpointsTooltip, LoadBalancingEndpointTooltip } from "/snippets/tooltips.jsx";
+
-This guide is for **queue-based endpoints**. If you're building a [load balancing endpoint](/serverless/load-balancing/overview), the request structure and endpoints will depend on how you define your HTTP servers.
+This guide is for . If you're building a , the request structure and endpoints will depend on how you define your HTTP servers.
-After creating a [Severless endpoint](/serverless/endpoints/overview), you can start sending it **requests** to submit jobs and retrieve results. This page covers everything from basic input structure and job submission, to monitoring, troubleshooting, and advanced options for queue-based endpoints.
+After creating a [Severless endpoint](/serverless/endpoints/overview), you can start sending it requests to submit and retrieve results. This page covers everything from basic input structure and job submission, to monitoring, troubleshooting, and advanced options for queue-based endpoints.
that process requests sequentially, load balancing endpoints route incoming traffic directly to available workers, distributing requests across the worker pool.
When building a load balancer, you're no longer limited to the standard `/run` or `/runsync` endpoints. Instead, you can create custom REST endpoints that are accessible via a unique URL:
@@ -35,7 +37,7 @@ Here are the key differences between the two endpoint types:
### Queue-based endpoints (traditional)
-With queue-based endpoints, requests are placed in a queue and processed in order. They use the standard handler pattern (`def handler(job)`) and are accessed through fixed endpoints like `/run` and `/runsync`.
+With queue-based endpoints, are placed in a queue and processed in order. They use the standard handler pattern (`def handler(job)`) and are accessed through fixed endpoints like `/run` and `/runsync`.
These endpoints are better for tasks that can be processed asynchronously and guarantee request processing, similar to how TCP guarantees packet delivery in networking.
diff --git a/serverless/load-balancing/vllm-worker.mdx b/serverless/load-balancing/vllm-worker.mdx
index d393e5c2..3bac2c57 100644
--- a/serverless/load-balancing/vllm-worker.mdx
+++ b/serverless/load-balancing/vllm-worker.mdx
@@ -57,15 +57,17 @@ touch src/utils.py
Your project structure should now look like this:
-```
-vllm_worker/
-├── Dockerfile
-├── requirements.txt
-├── src/
- ├── handler.py
- ├── models.py
- └── utils.py
-```
+
+
+
+
+
+
+
+
+
+
+
## Step 2: Define data models
diff --git a/serverless/overview.mdx b/serverless/overview.mdx
index 5b2fdd4d..5a0bf221 100644
--- a/serverless/overview.mdx
+++ b/serverless/overview.mdx
@@ -3,7 +3,7 @@ title: "Overview"
description: "Pay-as-you-go compute for AI models and compute-intensive workloads."
---
-import { EndpointTooltip, WorkersTooltip, WorkerTooltip, HandlerFunctionTooltip, RequestTooltip, ColdStartTooltip, CachedModelsTooltip, PodTooltip, RunpodHubTooltip, PublicEndpointTooltip } from "/snippets/tooltips.jsx";
+import { EndpointTooltip, WorkersTooltip, WorkerTooltip, HandlerFunctionTooltip, RequestTooltip, ColdStartTooltip, CachedModelsTooltip, PodTooltip, RunpodHubTooltip, PublicEndpointTooltip, JobTooltip, LoadBalancingEndpointTooltip, QueueBasedEndpointsTooltip } from "/snippets/tooltips.jsx";
Runpod Serverless is a cloud computing platform that lets you run AI models and compute-intensive workloads without managing servers. You only pay for the actual compute time you use, with no idle costs when your application isn't processing requests.
@@ -72,12 +72,12 @@ runpod.serverless.start({"handler": handler}) # Required
```
-Handler functions are only used for queue-based (i.e. traditional) endpoints. If you're using a [load balancing endpoint](#load-balancing-endpoints), the request structure and endpoints will depend on how you define your HTTP servers.
+Handler functions are only used for (i.e. traditional endpoints). If you're using a , the request structure and endpoints will depend on how you define your HTTP servers.
### [Requests](/serverless/endpoints/send-requests)
-An HTTP request that you send to an endpoint, which can include parameters, payloads, and headers that define what the endpoint should process. For example, you can send a `POST` request to submit a job, or a `GET` request to check status of a job, retrieve results, or check endpoint health.
+An HTTP request that you send to an endpoint, which can include parameters, payloads, and headers that define what the endpoint should process. For example, you can send a `POST` request to submit a , or a `GET` request to check status of a job, retrieve results, or check endpoint health.
When a user/client sends a request to your endpoint:
@@ -130,7 +130,7 @@ Minimizing s is key to creating a responsive and cost-effect
### [Load balancing endpoints](/serverless/load-balancing/overview)
-These endpoints route incoming traffic directly to available workers, distributing requests across the worker pool. Unlike traditional queue-based endpoints, they provide no queuing mechanism for request backlog.
+These endpoints route incoming traffic directly to available workers, distributing requests across the worker pool. Unlike traditional , they provide no queuing mechanism for request backlog.
When using load balancing endpoints, you can define your own custom API endpoints without a handler function, using any HTTP framework of your choice (like FastAPI or Flask).
diff --git a/serverless/workers/create-dockerfile.mdx b/serverless/workers/create-dockerfile.mdx
index ff3ab197..e9f4d981 100644
--- a/serverless/workers/create-dockerfile.mdx
+++ b/serverless/workers/create-dockerfile.mdx
@@ -3,22 +3,29 @@ title: "Create a Dockerfile"
description: "Package your handler function for deployment."
---
-A Dockerfile defines the build process for a Docker image containing your handler function and all its dependencies. This page explains how to organize your project files and create a Dockerfile for your Serverless worker.
+import { HandlerFunctionTooltip } from "/snippets/tooltips.jsx";
+
+A Dockerfile defines the build process for a Docker image containing your and all its dependencies. This page explains how to organize your project files and create a Dockerfile for your Serverless worker.
## Project organization
Organize your project files in a clear directory structure:
-```
-project_directory
-├── Dockerfile # Instructions for building the Docker image
-├── src
-│ └── handler.py # Your handler function
-└── builder
- └── requirements.txt # Dependencies required by your handler
-```
+
+
+
+
+
+
+
+
+
+
+`/Dockerfile/` contains the instructions for building your worker image.
+
+`/src/handler.py/` is your .
-Your `requirements.txt` file should list all Python packages your handler needs:
+`/requirements.txt/` lists the Python dependencies required by your handler. For example:
```txt title="requirements.txt"
# Example requirements.txt
diff --git a/serverless/workers/handler-functions.mdx b/serverless/workers/handler-functions.mdx
index 61795671..052d9653 100644
--- a/serverless/workers/handler-functions.mdx
+++ b/serverless/workers/handler-functions.mdx
@@ -3,12 +3,12 @@ title: "Overview"
description: "Write custom handler functions to process incoming requests to your queue-based endpoints."
---
+import { JobTooltip, RequestsTooltip, WorkersTooltip, QueueBasedEndpointsTooltip, LoadBalancingEndpointTooltip } from "/snippets/tooltips.jsx";
-Handler functions form the core of your Runpod Serverless applications. They define how your workers process [incoming requests](/serverless/endpoints/send-requests) and return results. This section covers everything you need to know about creating effective handler functions.
-
+Handler functions form the core of your Runpod Serverless applications. They define how your workers process and return results. This section covers everything you need to know about creating effective handler functions.
-Handler functions are only required for **queue-based endpoints**. If you're building a [load balancing endpoint](/serverless/load-balancing/overview), you can define your own custom API endpoints using any HTTP framework of your choice (like FastAPI or Flask).
+Handler functions are only required for . If you're building a , you can define your own custom API endpoints using any HTTP framework of your choice (like FastAPI or Flask).
## Understanding job input
@@ -24,7 +24,7 @@ Before writing a handler function, make sure you understand the structure of the
}
```
-`id` is a unique identifier for the job randomly generated by Runpod, while `input` contains data sent by the client for your handler function to process.
+`id` is a unique identifier for the randomly generated by Runpod, while `input` contains data sent by the client for your handler function to process.
To learn how to structure requests to your endpoint, see [Send API requests](/serverless/endpoints/send-requests).
diff --git a/snippets/tooltips.jsx b/snippets/tooltips.jsx
index fe75a8a4..731299e8 100644
--- a/snippets/tooltips.jsx
+++ b/snippets/tooltips.jsx
@@ -56,6 +56,24 @@ export const RequestTooltip = () => {
);
};
+export const RequestsTooltip = () => {
+ return (
+ requests
+ );
+};
+
+export const JobTooltip = () => {
+ return (
+ job
+ );
+};
+
+export const JobsTooltip = () => {
+ return (
+ jobs
+ );
+};
+
export const WorkerTooltip = () => {
return (
@@ -75,6 +93,30 @@ export const EndpointTooltip = () => {
);
};
+export const QueueBasedEndpointTooltip = () => {
+ return (
+ queue-based endpoint
+ );
+};
+
+export const QueueBasedEndpointsTooltip = () => {
+ return (
+ queue-based endpoints
+ );
+};
+
+export const LoadBalancingEndpointTooltip = () => {
+ return (
+ load balancing endpoint
+ );
+};
+
+export const LoadBalancingEndpointsTooltip = () => {
+ return (
+ load balancing endpoints
+ );
+};
+
export const VLLMTooltip = () => {
return (
vLLM
diff --git a/tutorials/serverless/model-caching-text.mdx b/tutorials/serverless/model-caching-text.mdx
index de5920ac..566e951c 100644
--- a/tutorials/serverless/model-caching-text.mdx
+++ b/tutorials/serverless/model-caching-text.mdx
@@ -198,15 +198,24 @@ def resolve_snapshot_path(model_id: str) -> str:
snapshots_dir = os.path.join(model_root, "snapshots")
```
-Cached models use a specific directory structure. A model like `microsoft/Phi-3-mini-4k-instruct` gets stored at:
-
-```
-/runpod-volume/huggingface-cache/hub/models--microsoft--Phi-3-mini-4k-instruct/
-├── refs/
-│ └── main # Contains the commit hash of the "main" branch
-└── snapshots/
- └── abc123def.../ # Actual model files, named by commit hash
-```
+Cached models use a specific directory structure. A model like `microsoft/Phi-3-mini-4k-instruct` gets stored at `/runpod-volume/huggingface-cache/hub/`. For example:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
The `resolve_snapshot_path()` function navigates this structure to find the actual model files. It first tries to read the `refs/main` file, which contains the commit hash that the "main" branch points to. This is the most reliable method because it matches exactly what Hugging Face would load if you called `from_pretrained()` with network access.
From efb8f5918267d59c34c172c250eb1d0e26adb64d Mon Sep 17 00:00:00 2001
From: Mo King
Date: Thu, 5 Feb 2026 17:40:07 -0500
Subject: [PATCH 5/6] Add "check" components
---
get-started.mdx | 4 +++-
instant-clusters/axolotl.mdx | 4 +++-
pods/templates/create-custom-template.mdx | 2 ++
serverless/development/dual-mode-worker.mdx | 6 +++++-
serverless/load-balancing/build-a-worker.mdx | 4 +++-
serverless/load-balancing/vllm-worker.mdx | 4 +++-
serverless/quickstart.mdx | 2 ++
serverless/vllm/get-started.mdx | 2 ++
tutorials/migrations/cog/overview.mdx | 6 +++++-
tutorials/migrations/openai/overview.mdx | 4 +++-
tutorials/pods/comfyui.mdx | 2 ++
tutorials/pods/run-ollama.mdx | 2 ++
tutorials/serverless/comfyui.mdx | 2 ++
tutorials/serverless/model-caching-text.mdx | 2 ++
tutorials/serverless/run-your-first.mdx | 2 ++
15 files changed, 41 insertions(+), 7 deletions(-)
diff --git a/get-started.mdx b/get-started.mdx
index d8660b93..0c2f6054 100644
--- a/get-started.mdx
+++ b/get-started.mdx
@@ -57,7 +57,9 @@ Take a minute to explore the other tabs:
3. Type `print("Hello, world!")` in the first line of the notebook.
4. Click the play button to run your code.
-And that's it—congrats! You just ran your first line of code on Runpod.
+
+Congratulations! You just ran your first line of code on Runpod.
+
## Step 5: Clean up
diff --git a/instant-clusters/axolotl.mdx b/instant-clusters/axolotl.mdx
index a4075e24..89f89ea5 100644
--- a/instant-clusters/axolotl.mdx
+++ b/instant-clusters/axolotl.mdx
@@ -86,7 +86,9 @@ After running the command on the last Pod, you should see output similar to this
[2025-04-01 19:24:22,603] [INFO] [axolotl.train.save_trained_model:211] [PID:1009] [RANK:0] Training completed! Saving pre-trained model to ./outputs/lora-out.
```
-Congrats! You've successfully trained a model using Axolotl on an Instant Cluster. Your fine-tuned model has been saved to the `./outputs/lora-out` directory. You can now use this model for inference or continue training with different parameters.
+
+Congratulations! You've successfully trained a model using Axolotl on an Instant Cluster. Your fine-tuned model has been saved to the `./outputs/lora-out` directory. You can now use this model for inference or continue training with different parameters.
+
## Step 4: Clean up
diff --git a/pods/templates/create-custom-template.mdx b/pods/templates/create-custom-template.mdx
index efdca25c..80864771 100644
--- a/pods/templates/create-custom-template.mdx
+++ b/pods/templates/create-custom-template.mdx
@@ -518,7 +518,9 @@ To avoid incurring unnecessary charges, make sure to stop and then terminate you
## Next steps
+
Congratulations! You've built a custom Pod template and deployed it to Runpod.
+
You can use this as a jumping off point to build your own custom templates with your own applications, dependencies, and models.
diff --git a/serverless/development/dual-mode-worker.mdx b/serverless/development/dual-mode-worker.mdx
index 86579312..0bbeb1fb 100644
--- a/serverless/development/dual-mode-worker.mdx
+++ b/serverless/development/dual-mode-worker.mdx
@@ -380,7 +380,11 @@ After a few moments for initialization and processing, you should see output sim
## Explore the Pod-first development workflow
-Congratulations! You've successfully built, deployed, and tested a dual-mode Serverless worker. Now, let's explore the recommended iteration process for a Pod-first development workflow:
+
+Congratulations! You've successfully built, deployed, and tested a dual-mode Serverless worker.
+
+
+Now, let's explore the recommended iteration process for a Pod-first development workflow:
diff --git a/serverless/load-balancing/build-a-worker.mdx b/serverless/load-balancing/build-a-worker.mdx
index 40d3a2e5..4157688d 100644
--- a/serverless/load-balancing/build-a-worker.mdx
+++ b/serverless/load-balancing/build-a-worker.mdx
@@ -201,7 +201,9 @@ If you see: `{"error":"no workers available"}%` after running the request, this
For production applications, implement a health check with retries before sending requests. See [Handling cold start errors](/serverless/load-balancing/overview#handling-cold-start-errors) for a complete code example.
-Congratulations! You've now successfully deployed and tested a load balancing endpoint. If you want to use a real model, you can follow the [vLLM worker](/serverless/load-balancing/vllm-worker) tutorial.
+
+Congratulations! You've successfully deployed and tested a load balancing endpoint. If you want to use a real model, you can follow the [vLLM worker](/serverless/load-balancing/vllm-worker) tutorial.
+
## (Optional) Advanced endpoint definitions
diff --git a/serverless/load-balancing/vllm-worker.mdx b/serverless/load-balancing/vllm-worker.mdx
index 3bac2c57..c6055f93 100644
--- a/serverless/load-balancing/vllm-worker.mdx
+++ b/serverless/load-balancing/vllm-worker.mdx
@@ -605,7 +605,9 @@ If you see: `{"error":"no workers available"}%` after running the request, this
For production applications, implement a health check with retries before sending requests. See [Handling cold start errors](/serverless/load-balancing/overview#handling-cold-start-errors) for a complete code example.
-Congrats! You've created a load balancing vLLM endpoint and used it to serve a large language model.
+
+Congratulations! You've created a load balancing vLLM endpoint and used it to serve a large language model.
+
## Next steps
diff --git a/serverless/quickstart.mdx b/serverless/quickstart.mdx
index fc256acb..c102f504 100644
--- a/serverless/quickstart.mdx
+++ b/serverless/quickstart.mdx
@@ -234,7 +234,9 @@ When the workers finish processing your request, you should see output on the ri
}
```
+
Congratulations! You've successfully deployed and tested your first Serverless endpoint.
+
## Next steps
diff --git a/serverless/vllm/get-started.mdx b/serverless/vllm/get-started.mdx
index af05ca4c..19a98a84 100644
--- a/serverless/vllm/get-started.mdx
+++ b/serverless/vllm/get-started.mdx
@@ -147,7 +147,9 @@ If you encounter issues with your deployment:
## Next steps
+
Congratulations! You've successfully deployed a vLLM worker on Runpod Serverless. You now have a powerful, scalable LLM inference API that's compatible with both the OpenAI client and Runpod's native API.
+
Next you can try:
diff --git a/tutorials/migrations/cog/overview.mdx b/tutorials/migrations/cog/overview.mdx
index e7eb396e..e92596c0 100644
--- a/tutorials/migrations/cog/overview.mdx
+++ b/tutorials/migrations/cog/overview.mdx
@@ -110,7 +110,11 @@ Once your endpoint is set up and deployed, you'll be able to start receiving req
## Conclusion
-Congratulations, you have successfully migrated your Cog model from Replicate to Runpod and set up a Serverless endpoint. As you continue to develop your models and applications, consider exploring additional features and capabilities offered by Runpod to further enhance your projects.
+
+Congratulations! You've successfully migrated your Cog model from Replicate to Runpod and set up a Serverless endpoint.
+
+
+As you continue to develop your models and applications, consider exploring additional features and capabilities offered by Runpod to further enhance your projects.
Here are some resources to help you continue your journey:
diff --git a/tutorials/migrations/openai/overview.mdx b/tutorials/migrations/openai/overview.mdx
index a3e0ecc4..838ccaba 100644
--- a/tutorials/migrations/openai/overview.mdx
+++ b/tutorials/migrations/openai/overview.mdx
@@ -62,7 +62,9 @@ const chatCompletion = await openai.chat.completions.create({
-Congratulations on successfully modifying your OpenAI Codebase for use with your deployed on Runpod! This tutorial has equipped you with the knowledge to update your code for compatibility with OpenAI's API and to utilize the full spectrum of features available on the Runpod platform.
+
+Congratulations! You've successfully modified your OpenAI codebase for use with your deployed vLLM worker on Runpod. You now know how to update your code for compatibility with OpenAI's API and utilize the full spectrum of features available on the Runpod platform.
+
## Next Steps
diff --git a/tutorials/pods/comfyui.mdx b/tutorials/pods/comfyui.mdx
index 277fb9c1..e259aedd 100644
--- a/tutorials/pods/comfyui.mdx
+++ b/tutorials/pods/comfyui.mdx
@@ -169,7 +169,9 @@ Your workflow is now ready! Follow these steps to generate an image:
+
Congratulations! You've just generated your first image with ComfyUI on Runpod.
+
## Troubleshooting
diff --git a/tutorials/pods/run-ollama.mdx b/tutorials/pods/run-ollama.mdx
index fd5e3324..d706a374 100644
--- a/tutorials/pods/run-ollama.mdx
+++ b/tutorials/pods/run-ollama.mdx
@@ -85,7 +85,9 @@ curl -X POST https://OLLAMA_POD_ID-11434.proxy.runpod.net/api/generate -d '{
}'
```
+
Congratulations! You've set up Ollama on a Runpod Pod and made HTTP API requests to it.
+
For more API options, see the [Ollama API documentation](https://github.com/ollama/ollama/blob/main/docs/api.md).
diff --git a/tutorials/serverless/comfyui.mdx b/tutorials/serverless/comfyui.mdx
index e68abecb..7caff0ce 100644
--- a/tutorials/serverless/comfyui.mdx
+++ b/tutorials/serverless/comfyui.mdx
@@ -353,7 +353,9 @@ ComfyUI image successfully saved as 'comfyui_generated_image.png'
Image path: /Users/path/to/your/project/comfyui_generated_image.png
```
+
Congratulations! You've successfully used Runpod's Serverless platform to generate an AI image using ComfyUI with the FLUX.1-dev-fp8 model. You now understand the complete workflow for submitting ComfyUI jobs, monitoring their progress, and retrieving results.
+
## Understanding ComfyUI workflows
diff --git a/tutorials/serverless/model-caching-text.mdx b/tutorials/serverless/model-caching-text.mdx
index 566e951c..f9e4d7be 100644
--- a/tutorials/serverless/model-caching-text.mdx
+++ b/tutorials/serverless/model-caching-text.mdx
@@ -437,7 +437,9 @@ Expected response:
}
```
+
Congratulations! You've successfully deployed a Serverless endpoint that uses model caching to serve Phi-3.
+
## Benefits of using cached models
diff --git a/tutorials/serverless/run-your-first.mdx b/tutorials/serverless/run-your-first.mdx
index 847f57a2..134fa285 100644
--- a/tutorials/serverless/run-your-first.mdx
+++ b/tutorials/serverless/run-your-first.mdx
@@ -205,7 +205,9 @@ Image successfully saved as 'generated_image.png'
Image path: /Users/path/to/your/project/generated_image.png
```
+
Congratulations! You've successfully used Runpod's Serverless platform to generate an AI image using SDXL. You now understand the complete workflow of submitting asynchronous jobs, monitoring their progress, and retrieving results.
+
## Next steps
From f40479ce39753865b560acdc2f81cf37102d40ed Mon Sep 17 00:00:00 2001
From: Mo King
Date: Thu, 5 Feb 2026 17:57:20 -0500
Subject: [PATCH 6/6] add CUDA and PyTorch templates
---
instant-clusters.mdx | 4 ++--
instant-clusters/pytorch.mdx | 4 +++-
pods/choose-a-pod.mdx | 4 +++-
pods/templates/create-custom-template.mdx | 6 +++---
serverless/workers/create-dockerfile.mdx | 6 +++---
snippets/tooltips.jsx | 12 ++++++++++++
6 files changed, 26 insertions(+), 10 deletions(-)
diff --git a/instant-clusters.mdx b/instant-clusters.mdx
index 7e727db9..8411979c 100644
--- a/instant-clusters.mdx
+++ b/instant-clusters.mdx
@@ -4,7 +4,7 @@ sidebarTitle: "Overview"
description: "Fully managed compute clusters for multi-node training and AI inference."
---
-import { DataCenterTooltip } from "/snippets/tooltips.jsx";
+import { DataCenterTooltip, PyTorchTooltip } from "/snippets/tooltips.jsx";
@@ -47,7 +47,7 @@ Runpod automates cluster setup so you can focus on your workloads:
* Clusters are pre-configured with static IP address management.
* All necessary [environment variables](#environment-variables) for distributed training are pre-configured.
-* Supports popular frameworks like PyTorch, TensorFlow, and Slurm.
+* Supports popular frameworks like , TensorFlow, and Slurm.
## Get started
diff --git a/instant-clusters/pytorch.mdx b/instant-clusters/pytorch.mdx
index 3b460092..ad50b01d 100644
--- a/instant-clusters/pytorch.mdx
+++ b/instant-clusters/pytorch.mdx
@@ -3,7 +3,9 @@ title: "Deploy an Instant Cluster with PyTorch"
sidebarTitle: "PyTorch"
---
-This tutorial demonstrates how to use Instant Clusters with [PyTorch](http://pytorch.org) to run distributed workloads across multiple GPUs. By leveraging PyTorch's distributed processing capabilities and Runpod's high-speed networking infrastructure, you can significantly accelerate your training process compared to single-GPU setups.
+import { PyTorchTooltip } from "/snippets/tooltips.jsx";
+
+This tutorial demonstrates how to use Instant Clusters with to run distributed workloads across multiple GPUs. By leveraging PyTorch's distributed processing capabilities and Runpod's high-speed networking infrastructure, you can significantly accelerate your training process compared to single-GPU setups.
Follow the steps below to deploy a cluster and start running distributed PyTorch workloads efficiently.
diff --git a/pods/choose-a-pod.mdx b/pods/choose-a-pod.mdx
index 02b8c47f..66db930f 100644
--- a/pods/choose-a-pod.mdx
+++ b/pods/choose-a-pod.mdx
@@ -4,6 +4,8 @@ description: "Select the right Pod by evaluating your resource requirements."
sidebar_position: 3
---
+import { CUDATooltip } from "/snippets/tooltips.jsx";
+
Selecting the appropriate Pod configuration is a crucial step in maximizing performance and efficiency for your specific workloads. This guide will help you understand the key factors to consider when choosing a Pod that meets your requirements.
## Understanding your workload needs
@@ -28,7 +30,7 @@ There are several online tools that can help you estimate your resource requirem
### GPU selection
-The GPU is the cornerstone of computational performance for many workloads. When selecting your GPU, consider the architecture that best suits your software requirements. NVIDIA GPUs with CUDA support are essential for most machine learning frameworks, while some applications might perform better on specific GPU generations. Evaluate both the raw computing power (CUDA cores, tensor cores) and the memory bandwidth to ensure optimal performance for your specific tasks.
+The GPU is the cornerstone of computational performance for many workloads. When selecting your GPU, consider the architecture that best suits your software requirements. NVIDIA GPUs with support are essential for most machine learning frameworks, while some applications might perform better on specific GPU generations. Evaluate both the raw computing power (CUDA cores, tensor cores) and the memory bandwidth to ensure optimal performance for your specific tasks.
For machine learning inference, a mid-range GPU might be sufficient, while training large models requires more powerful options. Check framework-specific recommendations, as PyTorch, TensorFlow, and other frameworks may perform differently across GPU types.
diff --git a/pods/templates/create-custom-template.mdx b/pods/templates/create-custom-template.mdx
index 80864771..dfbf96e9 100644
--- a/pods/templates/create-custom-template.mdx
+++ b/pods/templates/create-custom-template.mdx
@@ -5,13 +5,13 @@ description: "A step-by-step guide to extending Runpod's official templates."
tag: "NEW"
---
-import { PodTooltip, PodsTooltip } from "/snippets/tooltips.jsx";
+import { PodTooltip, PodsTooltip, PyTorchTooltip, CUDATooltip, TemplateTooltip } from "/snippets/tooltips.jsx";
You can find the complete code for this tutorial, including automated build options with GitHub Actions, in the [runpod-workers/pod-template](https://github.com/runpod-workers/pod-template) repository.
-This tutorial shows how to build a custom template from the ground up. You'll extend an official Runpod template, add your own dependencies, configure how your container starts, and pre-load machine learning models. This approach saves time during Pod initialization and ensures consistent environments across deployments.
+This tutorial shows how to build a custom from the ground up. You'll extend an official Runpod template, add your own dependencies, configure how your container starts, and pre-load machine learning models. This approach saves time during Pod initialization and ensures consistent environments across deployments.
By creating custom templates, you can package everything your project needs into a reusable Docker image. Once built, you can deploy your workload in seconds instead of reinstalling dependencies every time you start a new Pod. You can also share your template with members of your team and the wider Runpod community.
@@ -71,7 +71,7 @@ Your project structure should now look like this:
## Step 2: Choose a base image and create your Dockerfile
-Runpod offers base images with PyTorch, CUDA, and common dependencies pre-installed. You'll extend one of these images to build your custom template.
+Runpod offers base images with , , and common dependencies pre-installed. You'll extend one of these images to build your custom template.
diff --git a/serverless/workers/create-dockerfile.mdx b/serverless/workers/create-dockerfile.mdx
index e9f4d981..698cb816 100644
--- a/serverless/workers/create-dockerfile.mdx
+++ b/serverless/workers/create-dockerfile.mdx
@@ -3,7 +3,7 @@ title: "Create a Dockerfile"
description: "Package your handler function for deployment."
---
-import { HandlerFunctionTooltip } from "/snippets/tooltips.jsx";
+import { HandlerFunctionTooltip, CUDATooltip } from "/snippets/tooltips.jsx";
A Dockerfile defines the build process for a Docker image containing your and all its dependencies. This page explains how to organize your project files and create a Dockerfile for your Serverless worker.
@@ -83,9 +83,9 @@ Include more system tools and libraries but are larger:
FROM python:3.11.1
```
-### CUDA images
+### images
-Required if you need CUDA libraries for GPU-accelerated workloads:
+Required if you need libraries for GPU-accelerated workloads:
```dockerfile
FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04
diff --git a/snippets/tooltips.jsx b/snippets/tooltips.jsx
index 731299e8..6b45ab71 100644
--- a/snippets/tooltips.jsx
+++ b/snippets/tooltips.jsx
@@ -123,6 +123,18 @@ export const VLLMTooltip = () => {
);
};
+export const PyTorchTooltip = () => {
+ return (
+ PyTorch
+ );
+};
+
+export const CUDATooltip = () => {
+ return (
+ CUDA
+ );
+};
+
export const CachedModelsTooltip = () => {
return (
cached models