Hi there, I'm Vedik Agarwal π
GitHub β’ LinkedIn β’ Email β’ Featured Project
π Masterβs in Computer Science (Machine Learning Track) at Columbia University
π§ ML & Systems Engineer working on LLM inference, federated learning, and ML infrastructure
β‘ Interested in memory-efficient attention, distributed optimization, and production ML systems
π IEEE-published researcher in federated intrusion detection
- Long-context & memory-efficient LLM inference
- Federated learning under non-IID distributions
- Distributed ML systems & microservices
- Privacy-preserving ML (Healthcare & IoT)
- High-performance model serving
π Featured Projects & Publications
π https://github.com/vedik2002/PagedFlexAttention
- Integrated paged KV caching with PyTorch FlexAttention
- Supports long-context (4K tokens) & high-concurrency decoding
- <1% GPU memory overhead with accuracy parity on MMLU & TruthfulQA
π https://ieeexplore.ieee.org/document/10857281
- Physics-based hyperparameter optimized federated MLP
- Robust under non-IID distributions
- Achieved 98% packet classification accuracy
π Industry Experience
Software Developer Intern β Trademarkia
- Distributed Golang microservices scaling to 50K+ requests/day
- Low-latency BERT inference service (~200ms)
- Vector search backend using Qdrant + AWS (98% accuracy)
Software Developer Intern β Hi Rapid Lab
- AI healthcare platforms for 10K+ rural patients
- DPDP-compliant secure medical data systems
Machine Learning Intern β Innova Point Infotech
- Siamese-network-based quality inspection (PyTorch, ResNet)
- Reduced defect detection time by 45%
- LinkedIn: https://linkedin.com/in/vedik-agarwal
- Email: va2565@columbia.edu
Interested in ML systems, LLM inference, or research collaboration?
Feel free to reach out.


