Harish KB HARISH20205

Hi, I'm Harish KB

AI & Backend Developer | LLM Systems

Building bots that nearly pass the Turing test - Go panics, GPU jobs, and hotfixes included.
Backend mayhem or agents hallucinating? I’ve got caffeine, let’s cook.

Org I'm building: HyperKuvid-Labs

What I Actually Do

Backend & Distributed Systems: Building robust APIs (FastAPI/Go), container orchestration, and microservices that don't crash under load.
High-Performance Compute: Optimizing inference engines, managing GPU scheduling, and squeezing performance out of hardware.
Fullstack Engineering: Connecting the dots from the database (Postgres/Mongo) to the client (React/TS), handling the chaos in between.

Projects

Inferia ChatHub — Distributed GPU Backend

Python • Docker • Nosana • SGLang

Backend for decentralized GPU compute
GPU job scheduling and container orchestration
30% lower latency, 40% better GPU usage

Nosana MCP Server — Infra Tooling

TypeScript • Go • MCP Protocol

Infra middleware for AI agents
Programmatic job and resource control
40% faster jobs, 25% lower infra cost

PHYDRA — ISS Stowage System

FastAPI • C++ • React • Prisma • Docker

High-performance stowage optimization system
Python backend driving C++ compute
10M+ items processed in under 5 seconds

Research

FrugalSOT: Frugal Search Over The Models

IEEE ICICIS 2025 (Accepted)

Architected edge-native NLP routing on Raspberry Pi 5 using NER/syntactic analysis, slashing inference time by 21.34% with <2.7% relevance loss.
Devised adaptive thresholding (EMA α=0.2) and cosine similarity logic to dynamically route requests across tiered models for zero-latency on-device execution.

SpecQuant: Speculative Decoding & Quantization

IEEE ICPC2T 2026 (Accepted)

Engineered adaptive speculative decoding with multi-parent quantization (FP16/Q8/Q4), achieving 22.6% faster inference via hardware-aware draft–target verification pipelines.
Implemented complexity-based prompt classification (syntactic/semantic metrics) for dynamic precision routing, maintaining <2% accuracy loss across diverse benchmarks.

Stack

AI Systems & High-Performance Compute

Inference Engines: vLLM, SGLang, Ollama, ONNX
Optimization Techniques: EAGLE, Speculative Decoding, Quantization, LoRA/PEFT
Compute & Systems: CUDA, C/C++, Distributed Training
Vision & Architectures: PyTorch, OpenCV, YOLO, CNN, ViT
Applied AI: Reinforcement Learning (RL), Diffusion Models, Speech and Summarization Systems
AI Orchestration: LangChain, LangGraph, Model Context Protocol (MCP), CrewAI

Systems Engineering & Fullstack

Core Stack: Python (FastAPI/Django), Go, C++
Data Layer: PostgreSQL, MongoDB, Redis, Prisma
Frontend: React.js, Tailwind CSS, TypeScript
Infra & Deployment: Docker, NGINX, AWS (EC2/S3), CI/CD, Nosana (GPU Compute)

Experience

AI Engineer Intern — Hooman Digital:
- Built the backend infrastructure for decentralized GPU inference.
- Managed container lifecycles and job scheduling systems.
Generative AI Intern — TITAN Company:
- Refactored legacy Flask services to FastAPI, boosting throughput by 2.5x.
- Built workflows with LangGraph to slash manual cataloging time by 60%.
AI R&D Intern — eBramha Techworks:
- Research-focused backend work for NLP summarization and Speech-to-Text systems.

Contact

Email: harishkb20205@gmail.com
Portfolio: harish-kb.web.app
LinkedIn: harish-kb-9417ba252
X / Twitter: @harish20205
Resume: View PDF

Open to collaborations on efficient LLM inference, model optimization, and distributed systems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly