Skip to content

HamSCI/hamsci_madrigal_loader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HamSCI Madrigal Loader

A Python pipeline for downloading and processing HamSCI HF propagation spot data from the Madrigal database daily HDF5 files. It loads one or more days, applies filters (time, region, frequency, distance, dataset/source), converts to Polars, and can generate 2D histograms (time x distance). Includes optional Parquet caching for faster reruns.

  • Downloads daily HDF5 files from Madrigal (via madrigalWeb / globalDownload.py)
  • Loads daily HDF5 files named like rsdYYYY-MM-DD.01.hdf5
  • Filters by:
    • Date/time range
    • Geographic bounds (lat/lon midpoint)
    • Frequency range (single or multiple ranges)
    • Distance range (km)
    • Dataset/source (RBN, WSPR, PSK)
  • Produces:
    • Filtered Polars dataframe (Parquet/CSV/HDF5)
    • 2D histogram (time vs distance) with metadata (Parquet/NetCDF/HDF5)
  • Caches:
    • Dataframes and histograms as Parquet for faster iterative runs

Repo layout

  • run_loader.py
    CLI entrypoint. Reads a JSON config and runs the pipeline day-by-day.
  • scripts/madrigal_loader.py
    MadrigalHamSpotLoader implementation
  • scripts/json_loader.py
    Config loading utilities.
  • scripts/regions.py
    Named region bounding boxes.
  • scripts/utils_freq.py
    Named frequency ranges and labels.
  • config/
    Example config(s).
  • download_madrigal_daily_hdf5.sh
    Helper script to download daily Madrigal HDF5 files via globalDownload.py.

Requirements

  • Python 3.10+ recommended
  • Dependencies are listed in requirements.txt

Install

pip install -r requirements.txt

Download Madrigal data (HDF5)

This project expects daily Madrigal HDF5 files named like:

rsdYYYY-MM-DD.01.hdf5

We download these using the Madrigal remote Python API package madrigalWeb (included in requirements ), which installs command-line tools such as globalDownload.py.

Download script (daily HDF5)

download_madrigal_daily_hdf5.sh loops day-by-day and calls globalDownload.py (installed by madrigalWeb) to fetch daily HDF5 files.

In the script, you typically only need to set:

  • startDate / endDate
  • --outputDir
  • --user_fullname
  • --user_email
  • --user_affiliation

Run:

chmod +x download_madrigal_daily_hdf5.sh ./download_madrigal_daily_hdf5.sh

Quick start (processing)

  1. Put your HDF5 files in a directory, for example:

    data/madrigal/rsd2019-12-01.01.hdf5 data/madrigal/rsd2019-12-02.01.hdf5

  2. Create and edit JSON files in config folder:

    config/example.json

  3. Run:

    python3 run_loader.py -p config/example.json

Config format

Example config/example.json:

{
  "data_dir": "data/madrigal",
  "cache_dir": "cache",
  "use_cache": true,
  "chunk_size": 100000,

  "sDate": "2019-12-01T00:00:00",
  "eDate": "2019-12-03T23:59:59",

  "filters": {
    "region_name": "CONUS",
    "freq": [7, 14],
    "distance_range": { "min_dist": 0, "max_dist": 3000 },
    "datasets": ["RBN", "WSPR"]
  },

  "output": {
    "output_dir": "output",
    "dataframe": { "generate": true, "formats": ["csv"] },
    "histogram": { "generate": true, "formats": ["csv"] }
  }
}

Config notes

  • filters.freq can be:
    • a single key string (example: "7MHz")
    • a list of key strings (example: ["7MHz","14MHz"])
  • filters.datasets is optional:
    • If omitted or null, all datasets are included.
    • Valid values: ["RBN","WSPR","PSK"]
  • Region and frequency keys must exist in scripts/regions.py and scripts/utils_freq.py.

Outputs

The pipeline runs day-by-day across the requested date range.

Outputs are written under your configured output.output_dir (default: output/), typically into:

  • output/dataframes/
  • output/histograms/

Histogram Parquet outputs include metadata stored in the Parquet schema under heatmap_meta.

Caching

When use_cache is true:

  • Dataframes are cached in: cache/dataframes/
  • Histograms are cached in: cache/heatmaps/

Cache filenames incorporate:

  • date range
  • region bounds
  • frequency range(s)
  • distance range
  • dataset selection

If you change filters and rerun, it creates a new cache entry automatically.

Contributing

PRs are welcome. Please keep large data and generated outputs out of git (see .gitignore).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •