Building bots that nearly pass the Turing test - Go panics, GPU jobs, and hotfixes included.
Backend mayhem or agents hallucinating? I’ve got caffeine, let’s cook.
Org I'm building: HyperKuvid-Labs
- Backend & Distributed Systems: Building robust APIs (FastAPI/Go), container orchestration, and microservices that don't crash under load.
- High-Performance Compute: Optimizing inference engines, managing GPU scheduling, and squeezing performance out of hardware.
- Fullstack Engineering: Connecting the dots from the database (Postgres/Mongo) to the client (React/TS), handling the chaos in between.
Python • Docker • Nosana • SGLang
- Backend for decentralized GPU compute
- GPU job scheduling and container orchestration
- 30% lower latency, 40% better GPU usage
TypeScript • Go • MCP Protocol
- Infra middleware for AI agents
- Programmatic job and resource control
- 40% faster jobs, 25% lower infra cost
FastAPI • C++ • React • Prisma • Docker
- High-performance stowage optimization system
- Python backend driving C++ compute
- 10M+ items processed in under 5 seconds
IEEE ICICIS 2025 (Accepted)
-
Architected edge-native NLP routing on Raspberry Pi 5 using NER/syntactic analysis, slashing inference time by 21.34% with <2.7% relevance loss.
-
Devised adaptive thresholding (EMA α=0.2) and cosine similarity logic to dynamically route requests across tiered models for zero-latency on-device execution.
IEEE ICPC2T 2026 (Accepted)
- Engineered adaptive speculative decoding with multi-parent quantization (FP16/Q8/Q4), achieving 22.6% faster inference via hardware-aware draft–target verification pipelines.
- Implemented complexity-based prompt classification (syntactic/semantic metrics) for dynamic precision routing, maintaining <2% accuracy loss across diverse benchmarks.
- Inference Engines: vLLM, SGLang, Ollama, ONNX
- Optimization Techniques: EAGLE, Speculative Decoding, Quantization, LoRA/PEFT
- Compute & Systems: CUDA, C/C++, Distributed Training
- Vision & Architectures: PyTorch, OpenCV, YOLO, CNN, ViT
- Applied AI: Reinforcement Learning (RL), Diffusion Models, Speech and Summarization Systems
- AI Orchestration: LangChain, LangGraph, Model Context Protocol (MCP), CrewAI
- Core Stack: Python (FastAPI/Django), Go, C++
- Data Layer: PostgreSQL, MongoDB, Redis, Prisma
- Frontend: React.js, Tailwind CSS, TypeScript
- Infra & Deployment: Docker, NGINX, AWS (EC2/S3), CI/CD, Nosana (GPU Compute)
-
AI Engineer Intern — Hooman Digital:
- Built the backend infrastructure for decentralized GPU inference.
- Managed container lifecycles and job scheduling systems.
-
Generative AI Intern — TITAN Company:
- Refactored legacy Flask services to FastAPI, boosting throughput by 2.5x.
- Built workflows with LangGraph to slash manual cataloging time by 60%.
-
AI R&D Intern — eBramha Techworks:
- Research-focused backend work for NLP summarization and Speech-to-Text systems.
- Email: harishkb20205@gmail.com
- Portfolio: harish-kb.web.app
- LinkedIn: harish-kb-9417ba252
- X / Twitter: @harish20205
- Resume: View PDF
Open to collaborations on efficient LLM inference, model optimization, and distributed systems.


