NBA Player Predictions — End-to-End (R + Python)

Link to Shiny App for final product: https://jmotta31.shinyapps.io/NBA_Prediction_Tool/

Portfolio project that builds a daily NBA player-level prediction dataset and a Shiny app to explore it.
Data is gathered and engineered in R (hoopR) and Python (nba_api + custom pipeline), modeled with a Bayesian-style neural network (MC Dropout, and scheduled on Windows via Task Scheduler.

Status: designed to run daily during the NBA season (Oct 21 – Jun 22) and skip the offseason.

Highlights

Automated data refresh (R): pulls player box scores, schedule, and builds lineup/on-off features; guarded by a month/day season window.
Feature & model pipeline (Python): scraping/assembly → feature engineering → preprocessing → BNN training.
Daily predictions: generates predictions/nba_predictions_YYYY-MM-DD.parquet for upcoming games; Shiny app hot-reloads.
Launchers for production: small wrapper scripts handle season windows, logs, and lock files to avoid overlapping runs.
Shiny UI: dark theme, sortable/filterable tables, and a lightweight implied-probability calculator.

Repository layout

.
├─ R_Scripts/
│  ├─ update_nba_data.R              # Pulls hoopR data; builds lineup/on-off; writes parquet datasets/
│  ├─ launch_update_nba_data.R       # Season guard + logs + lock; calls update_nba_data.R (Task Scheduler)
|  ├─ www                            # DALL-E generated images for the Shiny App
│  └─ app.R                          # Shiny app; reactive polling for predictions + metrics (parquet)
│
├─ Python_Scripts/                   # (or project root, depending on your layout)
│  ├─ scraping_loading.py            # Scrapes advanced data, merges with hoopR gamelogs + injuries, light FE
│  ├─ feature_engineering.py         # Main feature engineering block
│  ├─ preprocessing.py               # Train/test split, imputers, encoding, pipelines, scalers
│  ├─ bnn.py                         # Bayesian-like NN (MC dropout for quantiles)
│  ├─ run_pipeline.py                # Orchestrates training end-to-end
│  ├─ nba_predictions.py             # Loads artifacts; produces daily predictions parquet
│  └─ launch_nba_predictions.py      # Season guard + logs + lock; calls nba_predictions.py (Task Scheduler)
│
├─ predictions/                      # Daily predictions parquet (nba_predictions_YYYY-MM-DD.parquet)
└─ README.md

How the pipeline runs (daily)

Data refresh (R)
- R_Scripts/update_nba_data.R
  - Pulls player gamelogs via hoopR.
  - Builds upcoming schedule.
  - Derives lineup stints and on/off summaries (parallelized).
  - Writes parquet files in datasets/.
- R_Scripts/launch_update_nba_data.R
  - Season window (month/day only), timestamped logs in logs/, and a lightweight lock file (prevents overlap).
  - Called by Windows Task Scheduler every morning.
Modeling & predictions (Python)
- scraping_loading.py assembles a unified dataframe (nba_api + hoopR + injury flags).
- feature_engineering.py derives features
- preprocessing.py builds pipelines (imputers/encoders/scalers); bnn.py defines a Bayesian-style NN (MC dropout).
- run_pipeline.py trains the model; artifacts saved under models/ and pipelines/.
- nba_predictions.py loads artifacts, scores upcoming games, filters by available props, and saves predictions/nba_predictions_YYYY-MM-DD.parquet locally and pushes a separate nba_predictions.parquet to this repo to update the Shiny App's predictions reactable.
- launch_nba_predictions.py wraps the run for scheduling (season window, logs, lock).
Shiny app
- app.R reads:
  - predictions/nba_predictions.parquet via GitHub URL.
  - datasets/Evaluation_Metrics.parquet via GitHub URL.
- No republish needed day-to-day; the app picks up the latest parquet.

Quickstart

This project uses both R and Python. Any recent R (≥4.3) and Python (≥3.9) is fine.

R setup

# From an R console:
install.packages(c("dplyr","arrow","hoopR","lubridate","tidyr","data.table","purrr","stringr","tibble","parallel"))

Run once at the project root to populate datasets/:

Rscript R_Scripts/update_nba_data.R

Python setup

Create/activate an environment, then:

pip install numpy pandas pyarrow joblib scikit-learn torch unidecode nba_api
# (plus any others used in your scripts)

Train models:

python Python_Scripts/run_pipeline.py

Generate daily predictions (writes to predictions/):

python Python_Scripts/nba_predictions.py

Shiny app

Run locally:

# from project root
shiny::runApp("app.R")

Environment overrides (optional):

PREDICTIONS_DIR (default: predictions)
METRICS_PATH (default: datasets/Evaluation_Metrics.parquet)

Windows Task Scheduler (daily, in-season only)

Create two daily tasks (one for R data refresh, one for Python predictions). Each task:

Trigger: Daily at a morning time you prefer.
Action → Start a program:
- Program/script: "<path to Rscript.exe>" or "<path to python.exe>"
- Add arguments: "<full path to launcher>"
  - R: "...\R_Scripts\launch_update_nba_data.R"
  - Py: "...\Python_Scripts\launch_nba_predictions.py"
- Start in: "<project root>" (e.g., C:\Users\...\NBA_Prediction_Tool)
Run whether user is logged on or not
Run with highest privileges (if needed for paths)

The launchers handle:

Season window: Oct 21–Dec 31 and Jan 1–Jun 22 (month/day only).
Lock file: avoids overlapping runs.
Logs: timestamped files under logs/ plus a data_refresh_latest.log or similar copy.

Modeling notes (BNN)

The network in bnn.py uses MC dropout at inference to approximate predictive uncertainty.
Quantile outputs (e.g., 10th/50th/90th) enable prediction intervals per target (Points, Assists, Rebounds, 3PTM, Steals, Blocks).

Shiny App

Link: https://jmotta31.shinyapps.io/NBA_Prediction_Tool/
Predictions table is driven by a reactivePoll keyed on path + mtime + size so the UI refreshes when a new-day parquet lands or the current file is rewritten.
Metrics use reactiveFileReader and refresh daily.
No redeploy needed; app reads the freshest files from disk.

Disclaimers

This project is for portfolio and educational purposes.
The predictions and any betting-related outputs are not financial advice.
Use responsibly and in accordance with local laws.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NBA Player Predictions — End-to-End (R + Python)

Highlights

Repository layout

How the pipeline runs (daily)

Quickstart

R setup

Python setup

Shiny app

Windows Task Scheduler (daily, in-season only)

Modeling notes (BNN)

Shiny App

Disclaimers

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
Python_Scripts		Python_Scripts
R_Scripts		R_Scripts
Evaluation_Metrics.parquet		Evaluation_Metrics.parquet
LICENSE		LICENSE
README.md		README.md
predictions.parquet		predictions.parquet

License

jgmotta731/NBA_Stat_Pred

Folders and files

Latest commit

History

Repository files navigation

NBA Player Predictions — End-to-End (R + Python)

Highlights

Repository layout

How the pipeline runs (daily)

Quickstart

R setup

Python setup

Shiny app

Windows Task Scheduler (daily, in-season only)

Modeling notes (BNN)

Shiny App

Disclaimers

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages