Link to Shiny App for final product: https://jmotta31.shinyapps.io/NBA_Prediction_Tool/
Portfolio project that builds a daily NBA player-level prediction dataset and a Shiny app to explore it.
Data is gathered and engineered in R (hoopR) and Python (nba_api + custom pipeline), modeled with a Bayesian-style neural network (MC Dropout, and scheduled on Windows via Task Scheduler.
Status: designed to run daily during the NBA season (Oct 21 – Jun 22) and skip the offseason.
- Automated data refresh (R): pulls player box scores, schedule, and builds lineup/on-off features; guarded by a month/day season window.
- Feature & model pipeline (Python): scraping/assembly → feature engineering → preprocessing → BNN training.
- Daily predictions: generates
predictions/nba_predictions_YYYY-MM-DD.parquetfor upcoming games; Shiny app hot-reloads. - Launchers for production: small wrapper scripts handle season windows, logs, and lock files to avoid overlapping runs.
- Shiny UI: dark theme, sortable/filterable tables, and a lightweight implied-probability calculator.
.
├─ R_Scripts/
│ ├─ update_nba_data.R # Pulls hoopR data; builds lineup/on-off; writes parquet datasets/
│ ├─ launch_update_nba_data.R # Season guard + logs + lock; calls update_nba_data.R (Task Scheduler)
| ├─ www # DALL-E generated images for the Shiny App
│ └─ app.R # Shiny app; reactive polling for predictions + metrics (parquet)
│
├─ Python_Scripts/ # (or project root, depending on your layout)
│ ├─ scraping_loading.py # Scrapes advanced data, merges with hoopR gamelogs + injuries, light FE
│ ├─ feature_engineering.py # Main feature engineering block
│ ├─ preprocessing.py # Train/test split, imputers, encoding, pipelines, scalers
│ ├─ bnn.py # Bayesian-like NN (MC dropout for quantiles)
│ ├─ run_pipeline.py # Orchestrates training end-to-end
│ ├─ nba_predictions.py # Loads artifacts; produces daily predictions parquet
│ └─ launch_nba_predictions.py # Season guard + logs + lock; calls nba_predictions.py (Task Scheduler)
│
├─ predictions/ # Daily predictions parquet (nba_predictions_YYYY-MM-DD.parquet)
└─ README.md
-
Data refresh (R)
-
R_Scripts/update_nba_data.R- Pulls player gamelogs via hoopR.
- Builds upcoming schedule.
- Derives lineup stints and on/off summaries (parallelized).
- Writes parquet files in
datasets/.
-
R_Scripts/launch_update_nba_data.R- Season window (month/day only), timestamped logs in
logs/, and a lightweight lock file (prevents overlap). - Called by Windows Task Scheduler every morning.
- Season window (month/day only), timestamped logs in
-
-
Modeling & predictions (Python)
scraping_loading.pyassembles a unified dataframe (nba_api + hoopR + injury flags).feature_engineering.pyderives featurespreprocessing.pybuilds pipelines (imputers/encoders/scalers);bnn.pydefines a Bayesian-style NN (MC dropout).run_pipeline.pytrains the model; artifacts saved undermodels/andpipelines/.nba_predictions.pyloads artifacts, scores upcoming games, filters by available props, and savespredictions/nba_predictions_YYYY-MM-DD.parquetlocally and pushes a separatenba_predictions.parquetto this repo to update the Shiny App's predictions reactable.launch_nba_predictions.pywraps the run for scheduling (season window, logs, lock).
-
Shiny app
app.Rreads:predictions/nba_predictions.parquetvia GitHub URL.datasets/Evaluation_Metrics.parquetvia GitHub URL.
- No republish needed day-to-day; the app picks up the latest parquet.
This project uses both R and Python. Any recent R (≥4.3) and Python (≥3.9) is fine.
# From an R console:
install.packages(c("dplyr","arrow","hoopR","lubridate","tidyr","data.table","purrr","stringr","tibble","parallel"))Run once at the project root to populate datasets/:
Rscript R_Scripts/update_nba_data.RCreate/activate an environment, then:
pip install numpy pandas pyarrow joblib scikit-learn torch unidecode nba_api
# (plus any others used in your scripts)Train models:
python Python_Scripts/run_pipeline.pyGenerate daily predictions (writes to predictions/):
python Python_Scripts/nba_predictions.pyRun locally:
# from project root
shiny::runApp("app.R")Environment overrides (optional):
PREDICTIONS_DIR(default:predictions)METRICS_PATH(default:datasets/Evaluation_Metrics.parquet)
Create two daily tasks (one for R data refresh, one for Python predictions). Each task:
- Trigger: Daily at a morning time you prefer.
- Action → Start a program:
- Program/script:
"<path to Rscript.exe>"or"<path to python.exe>" - Add arguments:
"<full path to launcher>"- R:
"...\R_Scripts\launch_update_nba_data.R" - Py:
"...\Python_Scripts\launch_nba_predictions.py"
- R:
- Start in:
"<project root>"(e.g.,C:\Users\...\NBA_Prediction_Tool)
- Program/script:
- Run whether user is logged on or not
- Run with highest privileges (if needed for paths)
The launchers handle:
- Season window: Oct 21–Dec 31 and Jan 1–Jun 22 (month/day only).
- Lock file: avoids overlapping runs.
- Logs: timestamped files under
logs/plus adata_refresh_latest.logor similar copy.
- The network in
bnn.pyuses MC dropout at inference to approximate predictive uncertainty. - Quantile outputs (e.g., 10th/50th/90th) enable prediction intervals per target (Points, Assists, Rebounds, 3PTM, Steals, Blocks).
- Link: https://jmotta31.shinyapps.io/NBA_Prediction_Tool/
- Predictions table is driven by a
reactivePollkeyed on path + mtime + size so the UI refreshes when a new-day parquet lands or the current file is rewritten. - Metrics use
reactiveFileReaderand refresh daily. - No redeploy needed; app reads the freshest files from disk.
- This project is for portfolio and educational purposes.
- The predictions and any betting-related outputs are not financial advice.
- Use responsibly and in accordance with local laws.