Llama Stack with MCP Server

Welcome to the Llama Stack with MCP Server Quickstart!

Use this to quickly deploy Llama 3.2-3B on vLLM with Llama Stack and MCP servers in your OpenShift AI environment.

To see how it's done, jump straight to installation.

Description

This quickstart provides a complete setup for deploying:

Llama 3.2-3B model using vLLM on OpenShift AI
Llama Stack for agent-based interactions
Sample HR application providing restful services to HR data e.g. vacation booking
MCP Weather Server for real-time weather data access
Custom MCP server providing access to the sample HR application

Custom MCP Server

The custom MCP server (custom-mcp-server/) demonstrates how to build a Model Context Protocol server that integrates with enterprise APIs. This server provides the following tools to the LLM:

Vacation Management: Check vacation balances and create vacation requests

The MCP server acts as a bridge between the Llama Stack and the HR Enterprise API, translating LLM tool calls into REST API requests.

Source Code & Build Instructions: If you want to modify the custom MCP server, see the complete source code and build instructions in the custom-mcp-server/ directory. The server is built using Python and can be customized to integrate with your own enterprise APIs.

How MCP Servers Work with Llama Stack

MCP Server Registration

MCP servers are registered with Llama Stack through configuration. The Llama Stack server automatically discovers and connects to configured MCP servers at startup. Here's an example of how MCP servers are configured:

# Llama Stack MCP server configuration
mcpServers:
  - name: "mcp-weather"
    uri: "http://mcp-weather:3001"
    description: "Weather data MCP server"
  - name: "hr-api-tools"
    uri: "http://custom-mcp-server:8000/sse"
    description: "HR API MCP server with employee, vacation, job, and performance tools"

In this example, this configuration is maintained in the llama-stack-config config map, part of the llama-stack helm chart.

When Llama Stack starts, it:

Connects to each MCP server via Server-Sent Events (SSE) or WebSocket
Discovers available tools by querying each server's capabilities
Registers tool schemas that describe what each tool does and its parameters
Makes tools available to the LLM for use in conversations

Tool Execution Flow

When a user requests to use a tool, here's the complete flow:

User Request: User asks a question in the Llama Stack Playground (e.g., "What's the weather in New York?")
LLM Context: Llama Stack includes the available tool definitions in the system message sent to the LLM
LLM Response: The LLM decides to use a tool and responds with a structured tool call (e.g., getforecast with location parameter)
Tool Execution: Llama Stack intercepts the tool call and routes it to the appropriate MCP server
MCP Processing: The MCP server executes the tool (e.g., calls the weather API or HR database)
Result Return: The MCP server returns structured results back to Llama Stack
LLM Integration: Llama Stack provides the tool results to the LLM as context
Final Response: The LLM incorporates the tool results into a natural language response for the user

This seamless integration allows the LLM to access real-time data and perform actions while maintaining a natural conversational interface.

Architecture diagrams

References

Prerequisites

Minimum hardware requirements

8+ vCPUs
24+ GiB RAM

Optional, depending on selected hardware platform

1 GPU (NVIDIA L40, A10, or similar)
1 Intel® Gaudi® AI Accelerator

Required software

Red Hat OpenShift
Red Hat OpenShift AI 2.16+
OpenShift CLI (oc) - Download here
Helm CLI (helm) - Download here

Required permissions

Standard user. No elevated cluster permissions required

Install

Please note before you start

This example was tested on Red Hat OpenShift 4.17.30 & Red Hat OpenShift AI v2.19.0.

All components are deployed using Helm charts located in the helm/ directory:

helm/llama3.2-3b/ - Llama 3.2-3B model on vLLM
helm/llama-stack/ - Llama Stack server
helm/mcp-weather/ - Weather MCP server
helm/llama-stack-playground/ - Playground UI
helm/custom-mcp-server/ - Custom HR API MCP server
helm/hr-api/ - HR Enterprise API
helm/llama-stack-mcp/ - Umbrella chart for single-command deployment

Clone the repository

git clone https://github.com/rh-ai-quickstart/llama-stack-mcp-server.git && \
    cd llama-stack-mcp-server/

Create the project

oc new-project llama-stack-mcp-demo

Build and deploy the helm chart

Deploy the complete Llama Stack with MCP servers using the umbrella chart:

# Set device, options: [gpu, hpu]
export DEVICE="gpu"

# Build dependencies (downloads and packages all required charts)
helm dependency build ./helm/llama-stack-mcp

# Deploy everything onto the target device with a single command
helm install llama-stack-mcp ./helm/llama-stack-mcp --set device=$DEVICE \
  --set llama-stack.device=$DEVICE \
  --set llama3-2-3b.device=$DEVICE

Note: The llama-stack pod will be in CrashLoopBackOff status until the Llama model is fully loaded and being served. This is normal behavior as the Llama Stack server requires the model endpoint to be available before it can start successfully.

This will deploy all components including:

Llama 3.2-3B model on vLLM
Llama Stack server with automatic configuration
MCP Weather Server
HR Enterprise API
HR MCP Server
Llama Stack Playground

Once the deployment is complete, you should see:

To get the playground URL:

  export PLAYGROUND_URL=$(oc get route llama-stack-playground -o jsonpath='{.spec.host}' 2>/dev/null || echo "Route not found")
  echo "Playground: https://$PLAYGROUND_URL"

To check the status of all components:

  helm status llama-stack-mcp
  oc get pods

For troubleshooting:

  oc get pods
  oc logs -l app.kubernetes.io/name=llama-stack
  oc logs -l app.kubernetes.io/name=llama3-2-3b
  oc logs -l app.kubernetes.io/name=custom-mcp-server
  oc logs -l app.kubernetes.io/name=hr-enterprise-api
  oc logs -l app.kubernetes.io/name=mcp-weather

When the deployment is complete, you should see all pods running in your OpenShift console:

Test

Get the Llama Stack playground route:

oc get route llama-stack-playground -n llama-stack-mcp-demo

Open the playground URL in your browser (it will look something like https://llama-stack-playground-llama-stack-mcp-demo.apps.openshift-cluster.company.com)
In the playground:
- Click on the "Tools" tab
- Select "Weather" MCP Server from the available tools
- In the chat interface, type: "What's the weather in New York?"
You should receive a response similar to:

🛠 Using "getforecast" tool:

The current weather in New York is mostly sunny with a temperature of 75°F and a gentle breeze coming from the southwest at 7 mph. There is a chance of showers and thunderstorms this afternoon. Tonight, the temperature will drop to 66°F with a wind coming from the west at 9 mph. The forecast for the rest of the week is mostly sunny with temperatures ranging from 69°F to 85°F. There is a slight chance of showers and thunderstorms on Thursday and Friday nights.

This confirms that the Llama Stack is successfully communicating with the MCP Weather Server and can process weather-related queries.

Test HR API MCP server tools:

In the playground interface:

Navigate to Tools: Click on the "Tools" tab
Verify availability: Look for your Internal HR tools:
- get_vacation_balance - Check employee vacation balances
- create_vacation_request - Submit new vacation requests

Select the "internal-hr" MCP Server

Test with sample queries:

Test vacation balance:

What is the vacation balance for employee EMP001?

Test vacation request:

book some annual vacation time off for EMP001 for June 8th and 9th

Verification

Verify that your custom MCP server is working correctly:

# Check all pods are running
oc get pods

# Check the custom MCP server logs
oc logs -l app.kubernetes.io/name=custom-mcp-server

# Test the service connectivity
oc exec -it deployment/llama-stack -- curl http://custom-mcp-server/health

Cleanup

To remove all components from OpenShift:

Option 1: Remove umbrella chart (if using single command installation)

# Remove the complete deployment
helm uninstall llama-stack-mcp

Option 2: Delete the entire project

# Delete the project and all its resources
oc delete project llama-stack-mcp-demo

This will remove:

Llama 3.2-3B vLLM deployment
Llama Stack services and playground
MCP Weather Server
Custom MCP Server (if deployed)
HR Enterprise API (if deployed)
All associated ConfigMaps, Services, Routes, and Secrets

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
assets		assets
custom-mcp-server		custom-mcp-server
files		files
helm		helm
hr-api		hr-api
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Llama Stack with MCP Server

Table of Contents

Description

Custom MCP Server

How MCP Servers Work with Llama Stack

MCP Server Registration

Tool Execution Flow

Architecture diagrams

References

Prerequisites

Minimum hardware requirements

Optional, depending on selected hardware platform

Required software

Required permissions

Install

Clone the repository

Create the project

Build and deploy the helm chart

Test

Verification

Cleanup

Option 1: Remove umbrella chart (if using single command installation)

Option 2: Delete the entire project

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

rh-ai-quickstart/llama-stack-mcp-server

Folders and files

Latest commit

History

Repository files navigation

Llama Stack with MCP Server

Table of Contents

Description

Custom MCP Server

How MCP Servers Work with Llama Stack

MCP Server Registration

Tool Execution Flow

Architecture diagrams

References

Prerequisites

Minimum hardware requirements

Optional, depending on selected hardware platform

Required software

Required permissions

Install

Clone the repository

Create the project

Build and deploy the helm chart

Test

Verification

Cleanup

Option 1: Remove umbrella chart (if using single command installation)

Option 2: Delete the entire project

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages