A hands-on tutorial on designing, training, and evaluating probabilistic forecasts of dengue incidence using the DARTS library and Temporal Convolutional Network (TCN) models. Students work through data retrieval, cleaning, univariate and covariate-based forecasting for Rio de Janeiro, and visualisation and evaluation of forecast outputs.
Authors: Ciara Judge, Cathal Mills, Moritz Kraemer; February 2026
- Target: Forecast log(cases + 1) for Rio de Janeiro on a six‑month horizon, using 23 prescribed quantiles to summarise the predictive distribution.
- Data: Train on cleaned monthly dengue case data from 2015 to 2022/2023, with optional climate, socioeconomic, and demographic covariates.
- Workflow: Clean the data → train a TCN model → generate predictions → visualise and evaluate (e.g. RMSE, CRPS, WIS).
Follow-up exercises extend the code to new locations, new or extra covariates, and single time-point forecasts.
| Item | Location |
|---|---|
| Introductory presentation (slide content) | ML Forecasting in Python Tutorial.txt — narrative and key points for the lecture that precedes the practical. |
| Colab notebook (main practical) | ML_forecasting_tutorial.ipynb — a static copy is here in the repo, but open in Google Colab save a copy and run step-by-step. |
| Epidemiological training data (Brazil) | data/BRA_epi_training.csv — monthly case incidence by municipality (2015–2023). |
| Covariate data & descriptors | data/README.md — full list of climate, socioeconomic, and demographic datasets; citations and licenses. Covariate files are obtained from the Global.health portal (see Data: where to get it). |
By the end of the tutorial you will be able to:
- Design and evaluate forecasting models for infectious diseases.
- Use the Python library DARTS for time-series forecasting (TCN and related models).
- Integrate covariate data (e.g. temperature, humidity) into infectious-disease forecasting models.
- Adapt the code to new locations, horizons, and predictor sets.
-
Data: cleaning and preparation
- Obtaining data from Global.health and loading it in Colab.
- Merging the epidemiological target with historical covariates and seasonal forecasts into a single table.
- Filtering to Rio de Janeiro and exploring the series.
-
Model: TCN (6‑month horizon, univariate)
- Building training/test splits.
- Defining and fitting a Temporal Convolutional Network.
- Generating 6‑month-ahead probabilistic forecasts (quantiles).
-
Visualisation and evaluation
- Plotting history and forecasts (including fan charts).
- Point metrics (e.g. RMSE) and probabilistic scores: CRPS, WIS (Weighted Interval Score).
- Concepts: calibration, sharpness, bias.
-
Introducing covariates
- Adding past covariates (e.g. temperature) to the TCN.
- Fitting and predicting with the covariate model.
-
Exercises
- Visualise the temperature-model results.
- Change forecast horizon (e.g. to 3 months).
- Fit models with other variables (e.g. relative humidity).
- Compare several environmental predictors and build a multivariate model.
- Open this notebook in google colab.
- Go to File → Save a Copy in Google Drive and use this copy for the tutorial.
OR
- Go to Google Colab.
- Upload or open
ML_forecasting_tutorial.ipynb(e.g. File → Upload notebook or clone this repo and open the file from Colab).
- Create a folder in your Google Drive (e.g.
forecasting_tutorial). - In the notebook, set
DATA_DIRto that folder (e.g.'/content/drive/MyDrive/forecasting_tutorial') after mounting Drive.
- Epidemiological data: You can use the copy in this repo (
data/BRA_epi_training.csv) or download it from the Global.health portal (see below). PlaceBRA_epi_training.csvin yourDATA_DIR. - Covariate NetCDF files: Download the Brazil (BRA) covariate files from the Global.health data portal and place them in the same folder. The notebook lists the exact filenames (e.g.
BRA-reanalysis_monthly.zs.nc,BRA-spe06.zs.nc,BRA-population.zs.nc,BRA-seasonal_forecast_monthly.zs.nc, etc.). One covariate file is too large to host in this repository; all covariate data for the tutorial is provided via the portal.
Detailed dataset descriptions, citations, and licenses are in data/README.md.
- Mount Google Drive in the first cells, set
DATA_DIR, installdarts, then run the rest of the notebook in order.
- Global.health data repository: https://sim-dev-data.covid-19.global.health/data-downloads
Sign in with a Google account; access may need to be granted by an admin. Brazil files use the prefixBRA(e.g.BRA_epi_training.csv,BRA-reanalysis_monthly.zs.nc). - This repository contains:
- Epidemiological:
data/BRA_epi_training.csv. - Documentation: data/README.md describes all covariate datasets (climate, socioeconomic, demographic), file formats (CSV, NetCDF), and how they are used. The actual covariate NetCDF files (including the one that is too large to host here) are downloaded from the Global.health portal.
- Epidemiological:
- Dengue is a mosquito-borne virus of growing public health concern (e.g. record-breaking case counts and deaths in recent years), endemic in the Americas, Africa, Australasia, and expanding into Europe.
- Dynamics are climate-sensitive and seasonal, so environmental and demographic variables (temperature, precipitation, urbanisation, etc.) can inform forecasts.
- Probabilistic forecasting (e.g. quantiles) supports decision-making under uncertainty; evaluation uses scoring rules that capture both calibration and sharpness (e.g. CRPS, WIS).
- Temporal Convolutional Network (TCN): a deep-learning architecture that uses convolutional layers so predictions at time t depend only on current and past information; well-suited to time series with long-range structure.
- Evaluation: the tutorial covers point metrics (e.g. RMSE on the median), CRPS (Continuous Ranked Probability Score), and WIS (Weighted Interval Score) and links them to calibration and sharpness.
- WHO Dengue situation reports
- DARTS documentation and API
- Tutorial data descriptors (this repo)
- Lim et al., The overlapping global distribution of dengue, chikungunya, zika and yellow fever, Nature Communications (2025)
- Colon-Gonzalez et al., Climate-based modelling and forecasting of dengue in 3 endemic departments of Peru (and related climate–dengue literature)
This project is licensed under the MIT License — see the LICENSE file for details. Individual datasets in the data folder or obtained from Global.health have their own licenses and citation requirements; see data/README.md.