I bridge the gap between research-grade ML models and production hardware. My focus is on air-gapped multimodal agents, medical diagnostic pipelines, and high-concurrency streaming systems.
- Nebulai Ecosystem: Architected a local LLM cluster (vLLM) and a "Router Agent" system, cutting token costs by 30% and operational costs by 60%.
- OpTomo (Medical): Designed end-to-end inference for breast cancer detection, optimizing Python logic into C++ bindings for 40% lower latency.
- Enterprise RAG: Built a secure "Talk-to-your-Data" tool using Hybrid Search (BM25 + Vector) and Apache NiFi pipelines.
- Edge-AI Engine: Engineered a hot-swappable mobile inference engine (Kotlin/ONNX) that switches neural architectures at runtime without app updates.
Edge AI & Hardware Optimization
Data Engineering, RAG & Databases
Computer Vision, Analysis & Visualization

