Skip to content

A comprehensive framework for multi-node, multi-GPU scalable LLM inference on HPC systems using vLLM and Ollama. Includes distributed deployment templates, benchmarking workflows, and chatbot/RAG pipelines for high-throughput, production-grade AI services

Notifications You must be signed in to change notification settings

ABHIPATEL98/AI-Inference-On-HPC

Repository files navigation

AI-Inference-On-HPC

A comprehensive framework for multi-node, multi-GPU scalable LLM inference on HPC systems using vLLM and Ollama. Includes distributed deployment templates, benchmarking workflows, and chatbot/RAG pipelines for high-throughput, production-grade AI services

This repository focuses on:

  • Multi-node vLLM inference on HPC clusters
  • GPU-accelerated LLM inference
  • Kubernetes / Slurm based distributed inference
  • HPC-scale RAG / Chatbot pipelines

About

A comprehensive framework for multi-node, multi-GPU scalable LLM inference on HPC systems using vLLM and Ollama. Includes distributed deployment templates, benchmarking workflows, and chatbot/RAG pipelines for high-throughput, production-grade AI services

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •