Parham Dehghani parhamdehghani

Hi there 👋 I'm Parham Dehghani

Machine Learning Engineer | Data Scientist | Particle Physicist
Turning cutting‑edge research into scalable, production‑grade AI.

🚀 About Me

Data scientist at Groupe Dynamite with a background in computational particle physics, specializing in building ML pipelines, deployment, and production systems.
PhD in High Energy Physics with collider phenomenology specialization, applying advanced ML algorithms to Beyond‑Standard‑Model (BSM) searches at present and future colliders.
Specialized in AI development and LLM evaluation & fine‑tuning (SFT & RLHF) across text, image, video & audio.
Proficient to build MLOps pipelines with Docker · Kubernetes · Airflow · MLflow · GitHub Actions/Jenkins on AWS SageMaker & GCP Vertex AI.
Passionate about bridging HPC ↔ Cloud and Research ↔ Production.

📚 Publications

Find my peer‑reviewed HEP papers on Inspire‑HEP & Google Scholar.

🛠️ Tech Stack

Category	Tooling
Languages	Python · SQL · Bash · C++/Fortran (HPC)
Frameworks	PyTorch · TensorFlow · scikit‑learn · Hugging Face 🤗 · LangChain
MLOps / Infra	Docker · Kubernetes · Airflow · MLflow · DVC · GitHub Actions · Jenkins · Prometheus + Grafana
Cloud	AWS (SageMaker, Lambda, S3) · GCP (Vertex AI, Cloud Run, BigQuery)
Data Processing	PySpark · Dask · Pandas · CUDA

🌐 Featured Projects

Project	What it does	Stack	Quick Links
scientific-multi-agent-HEP	Building and deploying a multi-agent AI system on AWS to automate scientific literature reviews using a fine-tuned LLM.	AWS SageMaker, S3, Docker, GitHub Actions, Hugging Face (QLoRA), RAG, PyTorch, CrewAI, FastAPI, Vector DB	Repo, Ongoing project
merchant-sales-forecast	Production-ready Machine Learning system for forecasting merchant sales revenue and determining cash advance eligibility, built with PySpark and deployed on Google Cloud Run.	PySpark, Docker, FastAPI, Google Cloud Run, PyTest	Repo, API
churn-prediction-mlops-pipeline	Deployed scalable MLOps churn prediction system on GCP based on KKBox dataset.	DVC, MLflow, Jenkins CI/CD pipeline, Airflow, Google Kubernetes Engine (GKE), PySpark, Docker	Repo, API
wanderLust-recommender-system	Hybrid hotel recommender system on GCP that recommends hotels based on the semantic content of the input query.	Google Cloud Run, Docker, PyTorch, Fine-tuning, SVD, FastAPI	Repo, API
solar-flux-forecasting	Forecasting daily solar F10.7 index for a 7-day horizon.	Model optimization, XGBoost, Streamlit, Unit test, Scientific report	Repo, Report
detoxification‑rl	RL‑based detoxification for LLM outputs using PPO & LoRA.	PyTorch, PEFT, Hugging Face Transformers, Reinforcement Learning	Repo
WebApp_DisasterResponse	Multi‑label crisis classifier with TF‑IDF, wrapped in a Flask web‑app.	ML-Pipeline, Flask, SVM, SQL	Repo
Recommendation Engine	Personalized recommendation engine using collaborative filtering and matrix factorization.	Pandas, SVD	Repo
CovidDetectionXRay	Detects COVID‑19 from chest X‑rays.	DenseNet201, PyTorch	Repo

✨ See more on my Portfolio.

🔭 Currently Working On

scientific-multi-agent-HEP: A Multi-Agent System for Scientific Research (High Energy Physics)

Objective: To design, build, and deploy an end-to-end multi-agent AI system capable of automating scientific literature reviews in HEP. This project leverages a fine-tuned LLM as the "brain" for an analytical agent, all hosted within the AWS ecosystem to demonstrate production-level MLOps practices.

Phase 1: Data Curation & Foundation (AWS Setup)

Goal: Establish a robust data pipeline and development environment on AWS.
Key Activities:
- Develop a Python script to programmatically download research papers (metadata and PDFs) from the arXiv API.
- Set up an Amazon S3 bucket for raw data storage and for the final, processed fine-tuning dataset.
- Configure Amazon SageMaker Studio as the primary IDE for data processing, experimentation, and script development.
- Create a high-quality, instruction-based dataset for fine-tuning by processing raw text and generating Q&A pairs, summaries, and keyword extractions.

Phase 2: Specialized LLM Fine-Tuning (SageMaker)

Goal: Fine-tune an open-source LLM (e.g., Llama 3 8B or Mistral 7B) to specialize in understanding scientific language and concepts.
Key Activities:
- Implement a parameter-efficient fine-tuning (PEFT) script using QLoRA via the Hugging Face transformers and peft libraries.
- Package the script and execute it as a SageMaker Training Job on a suitable GPU instance.
- Evaluate the fine-tuned model against the base model on a hold-out set to measure performance improvements in scientific comprehension tasks.
- Version and store the final model artifacts in S3.

Phase 3: Multi-Agent System Development (CrewAI)

Goal: Architect a collaborative team of AI agents to handle the research workflow.
Key Activities:
- Utilize the CrewAI framework to define three distinct agent roles:
  1. Literature_Researcher: Queries the arXiv API to find relevant papers.
  2. Research_Analyst: Uses the fine-tuned LLM and a RAG (Retrieval-Augmented Generation) pipeline with a vector database (ChromaDB/FAISS) to analyze the full text of the papers.
  3. Report_Synthesizer: Compiles the findings into a coherent, human-readable summary.
- Develop custom tools for the agents, such as the arXiv search function and PDF text extractor.

Phase 4: Integration & Productionization (Docker & SageMaker Endpoints)

Goal: Deploy the system components to create a functional, end-to-end application.
Key Activities:
- Deploy the fine-tuned model to a SageMaker Endpoint, creating a scalable, real-time inference API.
- Integrate the agent system with the live SageMaker Endpoint.
- Containerize the entire CrewAI application using Docker to ensure portability and reproducibility.
- Create a CI/CD workflow using GitHub Actions to automate the building of the Docker container.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly