Skip to content
View HARISH20205's full-sized avatar

Highlights

  • Pro

Organizations

@HyperKuvid-Labs

Block or report HARISH20205

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
HARISH20205/README.md

Hi, I'm Harish KB

AI & Backend Developer | LLM Systems

Building bots that nearly pass the Turing test - Go panics, GPU jobs, and hotfixes included.
Backend mayhem or agents hallucinating? I’ve got caffeine, let’s cook.

Org I'm building: HyperKuvid-Labs

What I Actually Do

  • Backend & Distributed Systems: Building robust APIs (FastAPI/Go), container orchestration, and microservices that don't crash under load.
  • High-Performance Compute: Optimizing inference engines, managing GPU scheduling, and squeezing performance out of hardware.
  • Fullstack Engineering: Connecting the dots from the database (Postgres/Mongo) to the client (React/TS), handling the chaos in between.

Projects

Inferia ChatHub — Distributed GPU Backend

Python • Docker • Nosana • SGLang

  • Backend for decentralized GPU compute
  • GPU job scheduling and container orchestration
  • 30% lower latency, 40% better GPU usage

Nosana MCP Server — Infra Tooling

TypeScript • Go • MCP Protocol

  • Infra middleware for AI agents
  • Programmatic job and resource control
  • 40% faster jobs, 25% lower infra cost

PHYDRA — ISS Stowage System

FastAPI • C++ • React • Prisma • Docker

  • High-performance stowage optimization system
  • Python backend driving C++ compute
  • 10M+ items processed in under 5 seconds

Research

FrugalSOT: Frugal Search Over The Models

IEEE ICICIS 2025 (Accepted)

  • Architected edge-native NLP routing on Raspberry Pi 5 using NER/syntactic analysis, slashing inference time by 21.34% with <2.7% relevance loss.

  • Devised adaptive thresholding (EMA α=0.2) and cosine similarity logic to dynamically route requests across tiered models for zero-latency on-device execution.

SpecQuant: Speculative Decoding & Quantization

IEEE ICPC2T 2026 (Accepted)

  • Engineered adaptive speculative decoding with multi-parent quantization (FP16/Q8/Q4), achieving 22.6% faster inference via hardware-aware draft–target verification pipelines.
  • Implemented complexity-based prompt classification (syntactic/semantic metrics) for dynamic precision routing, maintaining <2% accuracy loss across diverse benchmarks.

Stack

AI Systems & High-Performance Compute

  • Inference Engines: vLLM, SGLang, Ollama, ONNX
  • Optimization Techniques: EAGLE, Speculative Decoding, Quantization, LoRA/PEFT
  • Compute & Systems: CUDA, C/C++, Distributed Training
  • Vision & Architectures: PyTorch, OpenCV, YOLO, CNN, ViT
  • Applied AI: Reinforcement Learning (RL), Diffusion Models, Speech and Summarization Systems
  • AI Orchestration: LangChain, LangGraph, Model Context Protocol (MCP), CrewAI

Systems Engineering & Fullstack

  • Core Stack: Python (FastAPI/Django), Go, C++
  • Data Layer: PostgreSQL, MongoDB, Redis, Prisma
  • Frontend: React.js, Tailwind CSS, TypeScript
  • Infra & Deployment: Docker, NGINX, AWS (EC2/S3), CI/CD, Nosana (GPU Compute)

Experience

  • AI Engineer Intern — Hooman Digital:

    • Built the backend infrastructure for decentralized GPU inference.
    • Managed container lifecycles and job scheduling systems.
  • Generative AI Intern — TITAN Company:

    • Refactored legacy Flask services to FastAPI, boosting throughput by 2.5x.
    • Built workflows with LangGraph to slash manual cataloging time by 60%.
  • AI R&D Intern — eBramha Techworks:

    • Research-focused backend work for NLP summarization and Speech-to-Text systems.

Contact


Open to collaborations on efficient LLM inference, model optimization, and distributed systems.

Pinned Loading

  1. HyperKuvid-Labs/AlphaDesign HyperKuvid-Labs/AlphaDesign Public

    Hybrid AI framework combining reinforcement learning and genetic algorithms to optimize Formula 1 front wing aerodynamic designs. Features neural network-guided optimization, CFD analysis, structur…

    Python 1 1

  2. HyperKuvid-Labs/FrugalSOT HyperKuvid-Labs/FrugalSOT Public

    An adaptive model selection system for efficient on-device NLP inference, enhancing speed, privacy, and resource use on edge devices.

    TypeScript 1

  3. HyperKuvid-Labs/SpecQuant HyperKuvid-Labs/SpecQuant Public

    Scalable framework for adaptive LLM serving: classify prompt complexity → select quantized drafts → verify with FP16 target, no model retraining required.

    Python 1

  4. HyperKuvid-Labs/PHYDRA HyperKuvid-Labs/PHYDRA Public

    End-to-end cargo management system with advanced 3D bin packing algorithms, A*/Dijkstra pathfinding implemented in C++ for ISS applications.

    C++ 1

  5. Fine-Tuning-Resume Fine-Tuning-Resume Public

    Fine Tuned by LORA using my RTX 4060

    Jupyter Notebook

  6. cuda-chaos cuda-chaos Public

    A repo of my CUDA codes

    Cuda