Skip to content
View parhamdehghani's full-sized avatar
💭
Developing ML pipelines
💭
Developing ML pipelines

Organizations

@digital-dynamite

Block or report parhamdehghani

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
parhamdehghani/README.md

Hi there 👋 I'm Parham Dehghani

Machine Learning Engineer | Data Scientist | Particle Physicist
Turning cutting‑edge research into scalable, production‑grade AI.

Portfolio LinkedIn Udacity ML Nanodegree Email

🚀 About Me

  • Data scientist at Groupe Dynamite with a background in computational particle physics, specializing in building ML pipelines, deployment, and production systems.
  • PhD in High Energy Physics with collider phenomenology specialization, applying advanced ML algorithms to Beyond‑Standard‑Model (BSM) searches at present and future colliders.
  • Specialized in AI development and LLM evaluation & fine‑tuning (SFT & RLHF) across text, image, video & audio.
  • Proficient to build MLOps pipelines with Docker · Kubernetes · Airflow · MLflow · GitHub Actions/Jenkins on AWS SageMaker & GCP Vertex AI.
  • Passionate about bridging HPC ↔ Cloud and Research ↔ Production.

📚 Publications

Find my peer‑reviewed HEP papers on Inspire‑HEP & Google Scholar.

🛠️ Tech Stack

Category Tooling
Languages Python · SQL · Bash · C++/Fortran (HPC)
Frameworks PyTorch · TensorFlow · scikit‑learn · Hugging Face 🤗 · LangChain
MLOps / Infra Docker · Kubernetes · Airflow · MLflow · DVC · GitHub Actions · Jenkins · Prometheus + Grafana
Cloud AWS (SageMaker, Lambda, S3) · GCP (Vertex AI, Cloud Run, BigQuery)
Data Processing PySpark · Dask · Pandas · CUDA

Python Pandas Jupyter PyTorch TensorFlow Hugging Face Scikit‑learn Apache Spark Docker Kubernetes AWS GCP Airflow FastAPI Git GitHub PostgreSQL Streamlit Ubuntu MLflow GitHub Actions LangChain

🌐 Featured Projects

Project What it does Stack Quick Links
scientific-multi-agent-HEP Building and deploying a multi-agent AI system on AWS to automate scientific literature reviews using a fine-tuned LLM. AWS SageMaker, S3, Docker, GitHub Actions, Hugging Face (QLoRA), RAG, PyTorch, CrewAI, FastAPI, Vector DB Repo, Ongoing project
merchant-sales-forecast Production-ready Machine Learning system for forecasting merchant sales revenue and determining cash advance eligibility, built with PySpark and deployed on Google Cloud Run. PySpark, Docker, FastAPI, Google Cloud Run, PyTest Repo, API
churn-prediction-mlops-pipeline Deployed scalable MLOps churn prediction system on GCP based on KKBox dataset. DVC, MLflow, Jenkins CI/CD pipeline, Airflow, Google Kubernetes Engine (GKE), PySpark, Docker Repo, API
wanderLust-recommender-system Hybrid hotel recommender system on GCP that recommends hotels based on the semantic content of the input query. Google Cloud Run, Docker, PyTorch, Fine-tuning, SVD, FastAPI Repo, API
solar-flux-forecasting Forecasting daily solar F10.7 index for a 7-day horizon. Model optimization, XGBoost, Streamlit, Unit test, Scientific report Repo, Report
detoxification‑rl RL‑based detoxification for LLM outputs using PPO & LoRA. PyTorch, PEFT, Hugging Face Transformers, Reinforcement Learning Repo
WebApp_DisasterResponse Multi‑label crisis classifier with TF‑IDF, wrapped in a Flask web‑app. ML-Pipeline, Flask, SVM, SQL Repo
Recommendation Engine Personalized recommendation engine using collaborative filtering and matrix factorization. Pandas, SVD Repo
CovidDetectionXRay Detects COVID‑19 from chest X‑rays. DenseNet201, PyTorch Repo

✨ See more on my Portfolio.

🔭 Currently Working On

scientific-multi-agent-HEP: A Multi-Agent System for Scientific Research (High Energy Physics)

Objective: To design, build, and deploy an end-to-end multi-agent AI system capable of automating scientific literature reviews in HEP. This project leverages a fine-tuned LLM as the "brain" for an analytical agent, all hosted within the AWS ecosystem to demonstrate production-level MLOps practices.

Phase 1: Data Curation & Foundation (AWS Setup)

  • Goal: Establish a robust data pipeline and development environment on AWS.
  • Key Activities:
    • Develop a Python script to programmatically download research papers (metadata and PDFs) from the arXiv API.
    • Set up an Amazon S3 bucket for raw data storage and for the final, processed fine-tuning dataset.
    • Configure Amazon SageMaker Studio as the primary IDE for data processing, experimentation, and script development.
    • Create a high-quality, instruction-based dataset for fine-tuning by processing raw text and generating Q&A pairs, summaries, and keyword extractions.

Phase 2: Specialized LLM Fine-Tuning (SageMaker)

  • Goal: Fine-tune an open-source LLM (e.g., Llama 3 8B or Mistral 7B) to specialize in understanding scientific language and concepts.
  • Key Activities:
    • Implement a parameter-efficient fine-tuning (PEFT) script using QLoRA via the Hugging Face transformers and peft libraries.
    • Package the script and execute it as a SageMaker Training Job on a suitable GPU instance.
    • Evaluate the fine-tuned model against the base model on a hold-out set to measure performance improvements in scientific comprehension tasks.
    • Version and store the final model artifacts in S3.

Phase 3: Multi-Agent System Development (CrewAI)

  • Goal: Architect a collaborative team of AI agents to handle the research workflow.
  • Key Activities:
    • Utilize the CrewAI framework to define three distinct agent roles:
      1. Literature_Researcher: Queries the arXiv API to find relevant papers.
      2. Research_Analyst: Uses the fine-tuned LLM and a RAG (Retrieval-Augmented Generation) pipeline with a vector database (ChromaDB/FAISS) to analyze the full text of the papers.
      3. Report_Synthesizer: Compiles the findings into a coherent, human-readable summary.
    • Develop custom tools for the agents, such as the arXiv search function and PDF text extractor.

Phase 4: Integration & Productionization (Docker & SageMaker Endpoints)

  • Goal: Deploy the system components to create a functional, end-to-end application.
  • Key Activities:
    • Deploy the fine-tuned model to a SageMaker Endpoint, creating a scalable, real-time inference API.
    • Integrate the agent system with the live SageMaker Endpoint.
    • Containerize the entire CrewAI application using Docker to ensure portability and reproducibility.
    • Create a CI/CD workflow using GitHub Actions to automate the building of the Docker container.

GitHub stats GitHub streak Top languages

GitHub Trophies

Pinned Loading

  1. WebApp_DisasterResponse WebApp_DisasterResponse Public

    NLP classifier based on tfidf transformer and including a whole pipeline

    Python

  2. churn-prediction-mlops-pipeline churn-prediction-mlops-pipeline Public

    To architect, build, and deploy a scalable, end-to-end MLOps churn prediction system on Google Cloud, employing the huge WSDM-KKBox dataset

    Python

  3. Arvato-Segmentation Arvato-Segmentation Public

    Segmentation using clustering and then training and verifying a predictor given demographic features of individuals

    Jupyter Notebook

  4. detoxification-rl detoxification-rl Public

    Language model detoxification using Reinforcement Learning with PPO and PEFT/LoRA fine-tuning

    Python

  5. solar-flux-forecasting solar-flux-forecasting Public

    An investigation into forecasting the daily solar F10.7 index for a 7-day horizon. We adopted a feature-engineered approach, leveraging a Gradient Boosted Decision Tree (XGBoost) model.

    Jupyter Notebook

  6. WanderLust-recommender-system WanderLust-recommender-system Public

    Hybrid hotel recommender system on GCP that recommends hotels based on the semantic content of the input query

    Jupyter Notebook