Machine Learning Engineer | Data Scientist | Particle Physicist
Turning cutting‑edge research into scalable, production‑grade AI.
- Data scientist at Groupe Dynamite with a background in computational particle physics, specializing in building ML pipelines, deployment, and production systems.
- PhD in High Energy Physics with collider phenomenology specialization, applying advanced ML algorithms to Beyond‑Standard‑Model (BSM) searches at present and future colliders.
- Specialized in AI development and LLM evaluation & fine‑tuning (SFT & RLHF) across text, image, video & audio.
- Proficient to build MLOps pipelines with Docker · Kubernetes · Airflow · MLflow · GitHub Actions/Jenkins on AWS SageMaker & GCP Vertex AI.
- Passionate about bridging HPC ↔ Cloud and Research ↔ Production.
Find my peer‑reviewed HEP papers on Inspire‑HEP & Google Scholar.
| Category | Tooling |
|---|---|
| Languages | Python · SQL · Bash · C++/Fortran (HPC) |
| Frameworks | PyTorch · TensorFlow · scikit‑learn · Hugging Face 🤗 · LangChain |
| MLOps / Infra | Docker · Kubernetes · Airflow · MLflow · DVC · GitHub Actions · Jenkins · Prometheus + Grafana |
| Cloud | AWS (SageMaker, Lambda, S3) · GCP (Vertex AI, Cloud Run, BigQuery) |
| Data Processing | PySpark · Dask · Pandas · CUDA |
| Project | What it does | Stack | Quick Links |
|---|---|---|---|
| scientific-multi-agent-HEP | Building and deploying a multi-agent AI system on AWS to automate scientific literature reviews using a fine-tuned LLM. | AWS SageMaker, S3, Docker, GitHub Actions, Hugging Face (QLoRA), RAG, PyTorch, CrewAI, FastAPI, Vector DB | Repo, Ongoing project |
| merchant-sales-forecast | Production-ready Machine Learning system for forecasting merchant sales revenue and determining cash advance eligibility, built with PySpark and deployed on Google Cloud Run. | PySpark, Docker, FastAPI, Google Cloud Run, PyTest | Repo, API |
| churn-prediction-mlops-pipeline | Deployed scalable MLOps churn prediction system on GCP based on KKBox dataset. | DVC, MLflow, Jenkins CI/CD pipeline, Airflow, Google Kubernetes Engine (GKE), PySpark, Docker | Repo, API |
| wanderLust-recommender-system | Hybrid hotel recommender system on GCP that recommends hotels based on the semantic content of the input query. | Google Cloud Run, Docker, PyTorch, Fine-tuning, SVD, FastAPI | Repo, API |
| solar-flux-forecasting | Forecasting daily solar F10.7 index for a 7-day horizon. | Model optimization, XGBoost, Streamlit, Unit test, Scientific report | Repo, Report |
| detoxification‑rl | RL‑based detoxification for LLM outputs using PPO & LoRA. | PyTorch, PEFT, Hugging Face Transformers, Reinforcement Learning | Repo |
| WebApp_DisasterResponse | Multi‑label crisis classifier with TF‑IDF, wrapped in a Flask web‑app. | ML-Pipeline, Flask, SVM, SQL | Repo |
| Recommendation Engine | Personalized recommendation engine using collaborative filtering and matrix factorization. | Pandas, SVD | Repo |
| CovidDetectionXRay | Detects COVID‑19 from chest X‑rays. | DenseNet201, PyTorch | Repo |
✨ See more on my Portfolio.
Objective: To design, build, and deploy an end-to-end multi-agent AI system capable of automating scientific literature reviews in HEP. This project leverages a fine-tuned LLM as the "brain" for an analytical agent, all hosted within the AWS ecosystem to demonstrate production-level MLOps practices.
- Goal: Establish a robust data pipeline and development environment on AWS.
- Key Activities:
- Develop a Python script to programmatically download research papers (metadata and PDFs) from the arXiv API.
- Set up an Amazon S3 bucket for raw data storage and for the final, processed fine-tuning dataset.
- Configure Amazon SageMaker Studio as the primary IDE for data processing, experimentation, and script development.
- Create a high-quality, instruction-based dataset for fine-tuning by processing raw text and generating Q&A pairs, summaries, and keyword extractions.
- Goal: Fine-tune an open-source LLM (e.g., Llama 3 8B or Mistral 7B) to specialize in understanding scientific language and concepts.
- Key Activities:
- Implement a parameter-efficient fine-tuning (PEFT) script using QLoRA via the Hugging Face
transformersandpeftlibraries. - Package the script and execute it as a SageMaker Training Job on a suitable GPU instance.
- Evaluate the fine-tuned model against the base model on a hold-out set to measure performance improvements in scientific comprehension tasks.
- Version and store the final model artifacts in S3.
- Implement a parameter-efficient fine-tuning (PEFT) script using QLoRA via the Hugging Face
- Goal: Architect a collaborative team of AI agents to handle the research workflow.
- Key Activities:
- Utilize the CrewAI framework to define three distinct agent roles:
Literature_Researcher: Queries the arXiv API to find relevant papers.Research_Analyst: Uses the fine-tuned LLM and a RAG (Retrieval-Augmented Generation) pipeline with a vector database (ChromaDB/FAISS) to analyze the full text of the papers.Report_Synthesizer: Compiles the findings into a coherent, human-readable summary.
- Develop custom tools for the agents, such as the arXiv search function and PDF text extractor.
- Utilize the CrewAI framework to define three distinct agent roles:
- Goal: Deploy the system components to create a functional, end-to-end application.
- Key Activities:
- Deploy the fine-tuned model to a SageMaker Endpoint, creating a scalable, real-time inference API.
- Integrate the agent system with the live SageMaker Endpoint.
- Containerize the entire CrewAI application using Docker to ensure portability and reproducibility.
- Create a CI/CD workflow using GitHub Actions to automate the building of the Docker container.