diff --git a/.github/workflows/post-coverage.yml b/.github/workflows/post-coverage.yml
new file mode 100644
index 0000000..6aab4b6
--- /dev/null
+++ b/.github/workflows/post-coverage.yml
@@ -0,0 +1,35 @@
+name: Post coverage report to PR
+
+on:
+ workflow_run:
+ workflows: ["Python test"]
+ types:
+ - completed
+
+permissions:
+ pull-requests: write
+ actions: read
+
+jobs:
+ comment:
+ runs-on: ubuntu-latest
+ if: >
+ github.event.workflow_run.event == 'pull_request' &&
+ github.event.workflow_run.conclusion == 'success'
+ steps:
+ - name: Download coverage artifact
+ uses: actions/download-artifact@v4
+ with:
+ name: coverage-report
+ run-id: ${{ github.event.workflow_run.id }}
+ github-token: ${{ secrets.GITHUB_TOKEN }}
+
+ - name: Get PR number
+ id: pr_number
+ run: echo "number=$(cat pr_number.txt)" >> $GITHUB_OUTPUT
+
+ - name: Post coverage report to PR
+ uses: marocchino/sticky-pull-request-comment@v2
+ with:
+ path: cov_report.txt
+ number: ${{ steps.pr_number.outputs.number }}
diff --git a/.github/workflows/python-test.yml b/.github/workflows/python-test.yml
index 3c0bbdc..3d3805f 100644
--- a/.github/workflows/python-test.yml
+++ b/.github/workflows/python-test.yml
@@ -20,7 +20,6 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v4
with:
- ref: ${{ github.head_ref }}
fetch-depth: 0
- name: Install uv, set the python version, and enable cache
@@ -40,11 +39,19 @@ jobs:
coverage report -m --format markdown > cov_report.txt
coverage xml
- - name: Post coverage report to PR
+ - name: Save PR number
if: matrix.python-version == '3.11' && matrix.os == 'ubuntu-latest'
- uses: marocchino/sticky-pull-request-comment@v2
+ run: echo ${{ github.event.number }} > pr_number.txt
+
+ - name: Save coverage report and PR number
+ if: matrix.python-version == '3.11' && matrix.os == 'ubuntu-latest'
+ uses: actions/upload-artifact@v4
with:
- path: cov_report.txt
+ name: coverage-report
+ path: |
+ cov_report.txt
+ pr_number.txt
+ retention-days: 1
- name: Upload coverage reports to Codecov
uses: codecov/codecov-action@v4.0.1
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
index 6de56c5..4c8e3bc 100644
--- a/CODE_OF_CONDUCT.md
+++ b/CODE_OF_CONDUCT.md
@@ -60,7 +60,7 @@ representative at an online or offline event.
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
-max.pargmann@dlr.de.
+artist@lists.kit.edu.
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
diff --git a/README.md b/README.md
index 5887090..82460be 100644
--- a/README.md
+++ b/README.md
@@ -71,23 +71,27 @@ The ``PAINT`` repository is structured as shown below:
.
├── html # Code for the paint-database.org website
├── markers # Saved markers for the WRI1030197 power plant in Jülich
-├── paint # Python package
+├── paint # Python package/
│ ├── data
│ ├── preprocessing
│ └── util
├── plots # Scripts used to generate plots found in our paper
├── preprocessing-scripts # Scripts used for preprocessing and STAC generation
├── scripts # Scripts highlighting example usage of the data
-└── test # Tests for the python package
- ├── data
- ├── preprocessing
- └── util
+├── test # Tests for the python package/
+│ ├── data
+│ ├── preprocessing
+│ └── util
+└── tutorials # Interactive notebooks showcasing how to get started with PAINT
```
### Example usage:
In the ``scripts`` folder there are multiple scripts highlighting how ``PAINT`` can be used. Detailed
descriptions of these scripts are available via our [Documentation](http://paint.readthedocs.io).
+Furthermore, an interactive notebook is available in the ``tutorials`` folder - this is the perfect starting point to
+dive into ``PAINT``!
+
## How to contribute
Check out our [contribution guidelines](CONTRIBUTING.md) if you are interested in contributing to the `PAINT` project :fire:.
Please also carefully check our [code of conduct](CODE_OF_CONDUCT.md) :blue_heart:.
diff --git a/SECURITY.md b/SECURITY.md
index 55a3284..529845b 100644
--- a/SECURITY.md
+++ b/SECURITY.md
@@ -2,10 +2,15 @@
## Supported Versions
-We are currently supporting ``PAINT 1.0.0``
+We are currently supporting ``PAINT 2.0.1``
| Version | Supported |
-| ------- | ------------------ |
+|---------| ------------------ |
+| 2.0.1 | :white_check_mark: |
+| 2.0.0 | :white_check_mark: |
+| 1.0.3 | :white_check_mark: |
+| 1.0.2 | :white_check_mark: |
+| 1.0.1 | :white_check_mark: |
| 1.0.0 | :white_check_mark: |
## Reporting a Vulnerability
diff --git a/docs/conf.py b/docs/conf.py
index 5c7ae1d..10181d5 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -10,7 +10,7 @@
project = "PAINT"
copyright = f"{datetime.now().year}, ARTIST consortium"
author = "ARTIST Consortium"
-release = "2.0.0"
+release = "2.0.1"
# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
diff --git a/docs/dataset.rst b/docs/dataset.rst
index f145c33..1324bd2 100644
--- a/docs/dataset.rst
+++ b/docs/dataset.rst
@@ -32,7 +32,7 @@ There are three ways of creating a ``PaintCalibrationDataset``:
2. **From a benchmark file**
- You can also create the dataset from a benchmark file (see above). In this case, the ``benchmark_file`` must be provided:
+ You can also create the dataset from a benchmark file (see the :information on dataset splits:`splitter` for details). In this case, the ``benchmark_file`` containing information on the train, validation, and test split must be provided:
.. code-block:: python
diff --git a/docs/splitter.rst b/docs/splitter.rst
index 27bb5e8..5190fbc 100644
--- a/docs/splitter.rst
+++ b/docs/splitter.rst
@@ -34,7 +34,7 @@ Supported Splits
Again, the goal is to create diverse and challenging training and validation datasets.
- **Balanced Split:**
- This method uses KMeans clustering on azimuth and elevation features to ensure a stratified selection. The process includes:
+ This method uses k-means clustering on azimuth and elevation features to ensure a stratified selection. The process includes:
- Clustering the data into ``validation_size`` clusters.
- Selecting one data point per cluster for the validation split.
@@ -76,3 +76,5 @@ To generate the splits, simply call the ``get_dataset_splits()`` function:
azimuth_splits = splitter.get_dataset_splits(
split_type="azimuth", training_size=10, validation_size=30
)
+
+This returns a ``pd.Dataframe`` containing information on the splits, i.e. which samples belong to which split, and also saves this information as a CSV file.
diff --git a/docs/usage.rst b/docs/usage.rst
index 77f8b42..873699e 100644
--- a/docs/usage.rst
+++ b/docs/usage.rst
@@ -2,7 +2,23 @@
How To Use
==========
-Here, you can find an overview of how to use ``PAINT``.
+
+To get started with ``PAINT`` we have included a interactive notebook, which is available here: https://github.com/ARTIST-Association/PAINT/blob/main/tutorials/paint_data_tutorial.ipynb.
+
+This tutorial provides an interactive introduction to the PAINT database, demonstrating how to:
+- Initialize the STAC client.
+- Download and inspect metadata.
+- Generate calibration data splits.
+- Load calibration data using a dataloader.
+- Download and inspect other types of PAINT data.
+
+To run the tutorial make sure you install the tutorial dependencies, i.e.:
+
+.. code-block:: console
+
+ $ pip install "paint-csp[tutorial]"
+
+Most of the concepts covered in the interactive tutorial are also covered in the documentation and associated scripts listed below:
.. toctree::
:maxdepth: 1
diff --git a/paint/__init__.py b/paint/__init__.py
index a9d1c74..33c0184 100644
--- a/paint/__init__.py
+++ b/paint/__init__.py
@@ -1,6 +1,12 @@
import os
+from importlib.metadata import PackageNotFoundError, version
PAINT_ROOT = f"{os.sep}".join(__file__.split(os.sep)[:-2])
"""Reference to the root directory of ARTIST."""
+try:
+ __version__ = version("paint-csp")
+except PackageNotFoundError:
+ # Allows running from source without installation.
+ __version__ = "0.0.0"
-__all__ = ["PAINT_ROOT", "preprocessing", "util"]
+__all__ = ["PAINT_ROOT", "preprocessing", "util", "__version__"]
diff --git a/paint/data/dataset.py b/paint/data/dataset.py
index 964dcdb..fe11e6a 100644
--- a/paint/data/dataset.py
+++ b/paint/data/dataset.py
@@ -142,7 +142,7 @@ def _check_accepted_keys(key: str) -> None:
@classmethod
def from_benchmark(
cls,
- benchmark_file: str | Path,
+ benchmark_file: str | Path | pd.DataFrame,
root_dir: str | Path,
item_type: str,
download: bool = False,
@@ -157,8 +157,8 @@ def from_benchmark(
Parameters
----------
- benchmark_file : str | Path
- Path to the file containing the benchmark information.
+ benchmark_file : str | Path | pd.DataFrame
+ Path to the file containing the benchmark information, or dataframe containing this information.
root_dir : str | Path
Directory where the dataset will be stored.
item_type : str
@@ -182,12 +182,29 @@ def from_benchmark(
Validation dataset.
"""
root_dir = Path(root_dir)
- log.info(
- f"Begining the process of generating benchmark datasets. The file used to generate the benchmarks is:\n"
- f" {benchmark_file}!"
- )
- # Load the splits data.
- splits = pd.read_csv(benchmark_file)
+ if not isinstance(benchmark_file, pd.DataFrame):
+ log.info(
+ f"Begining the process of generating benchmark datasets. The file used to generate the benchmarks is:\n"
+ f" {benchmark_file}!"
+ )
+ # Load the splits data.
+ splits = pd.read_csv(benchmark_file)
+ else:
+ log.info(
+ "Begining the process of generating benchmark datasets using provided pandas dataframe!"
+ )
+ benchmark_file.reset_index(inplace=True)
+ splits = benchmark_file
+
+ expected_cols = ["Id", "HeliostatId", "Split"]
+ try:
+ pd.testing.assert_index_equal(splits.columns, pd.Index(expected_cols))
+ except AssertionError as e:
+ raise ValueError(
+ f"The dataset split file provide has an incorrect schema. Please verify and try again.\n"
+ f"Expected: {expected_cols}\n"
+ f"Details: {e}"
+ )
# Check whether to download the data or not.
if download: # pragma: no cover
diff --git a/paint/data/dataset_splits.py b/paint/data/dataset_splits.py
index 208b30a..2436ef8 100644
--- a/paint/data/dataset_splits.py
+++ b/paint/data/dataset_splits.py
@@ -457,6 +457,11 @@ def get_dataset_splits(
Size of the training split.
validation_size : int
Size of the validation split.
+
+ Returns
+ -------
+ pd.DataFrame
+ Dataframe containing information on the dataset splits.
"""
allowed_split_types = [
mappings.AZIMUTH_SPLIT,
diff --git a/paint/data/stac_client.py b/paint/data/stac_client.py
index ccce994..3686179 100644
--- a/paint/data/stac_client.py
+++ b/paint/data/stac_client.py
@@ -69,6 +69,7 @@ def __init__(
self.output_dir = pathlib.Path(output_dir)
self.output_dir.mkdir(parents=True, exist_ok=True)
self.chunk_size = chunk_size
+ log.info(f"Initializing STAC client to download data to: {output_dir}.")
@staticmethod
def load_checkpoint(path: pathlib.Path) -> dict[str, Any]:
@@ -715,7 +716,7 @@ def get_heliostat_data(
# Download the data for each heliostat.
for heliostat_catalog in heliostat_catalogs_list:
log.info(f"Processing heliostat catalog {heliostat_catalog.id}")
- success = False
+ success = True
# Download calibration data.
if get_calibration:
diff --git a/plots/04_create_distribution_plots.py b/plots/04_create_distribution_plots.py
index 744d54e..32dbf91 100644
--- a/plots/04_create_distribution_plots.py
+++ b/plots/04_create_distribution_plots.py
@@ -79,9 +79,8 @@ def __init__(
self.output_path.mkdir(parents=True, exist_ok=True)
self.figure_size = (4, 4)
- self.data = self._load_data()
- # Power plant position as tensor
+ # Power plant position as tensor.
power_plant_lat, power_plant_lon = convert_gk_to_lat_lon(
mappings.GK_RIGHT_BASE, mappings.GK_HEIGHT_BASE
)
@@ -92,7 +91,11 @@ def __init__(
mappings.POWER_PLANT_ALT,
]
)
- # Precompute receiver corners once
+
+ # Load data.
+ self.data = self._load_data()
+
+ # Precompute receiver corners once.
self.receiver_coordinates = [
convert_wgs84_coordinates_to_local_enu(
torch.tensor(coords), self.power_plant_position
diff --git a/pyproject.toml b/pyproject.toml
index 82b4071..0ea95ef 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -7,7 +7,7 @@ packages = ["paint"]
[project]
name = "paint-csp"
-version = "2.0.0"
+version = "2.0.1"
authors = [
{ name="ARTIST Consortium", email="artist@lists.kit.edu" },
]
@@ -17,7 +17,7 @@ requires-python = ">=3.10"
classifiers = [
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
- "Development Status :: 1 - Planning",
+ "Development Status :: 5 - Production/Stable",
]
dependencies = [
"numpy",
@@ -51,6 +51,7 @@ dev = [
"sphinxcontrib-napoleon",
"sphinxemoji"
]
+tutorial = ["jupyter"]
[project.urls]
Homepage = "https://github.com/ARTIST-Association/PAINT"
diff --git a/tests/data/test_dataset.py b/tests/data/test_dataset.py
index c5b9c30..4892b20 100644
--- a/tests/data/test_dataset.py
+++ b/tests/data/test_dataset.py
@@ -4,6 +4,7 @@
import cv2
import deepdiff
+import pandas as pd
import pytest
import torch
from torchvision import transforms
@@ -191,6 +192,25 @@ def test_from_benchmark(
assert len(test) == 4
assert len(val) == 3
+ # Test with Pandas dataframe as input instead of file.
+ benchmark_df = pd.read_csv(
+ pathlib.Path(PAINT_ROOT)
+ / "tests"
+ / "data"
+ / "test_data"
+ / "test_benchmark.csv",
+ index_col=0,
+ )
+ train, test, val = PaintCalibrationDataset.from_benchmark(
+ benchmark_file=benchmark_df,
+ root_dir=pathlib.Path(PAINT_ROOT) / "tests" / "data" / "test_data" / "dataset",
+ item_type=item_type,
+ download=download,
+ )
+ assert len(train) == 3
+ assert len(test) == 4
+ assert len(val) == 3
+
@pytest.mark.parametrize(
"item_type, heliostats",
@@ -284,3 +304,24 @@ def test_str_method() -> None:
"-The dataset contains 4 items\n"
)
assert str(dataset) == expected
+
+
+def test_from_benchmark_fails_with_incorrect_dataframe(
+ tmp_path: pathlib.Path,
+) -> None:
+ """
+ Verify that ``from_benchmark`` raises ``ValueError`` when the input dataframe has incorrect columns.
+
+ Parameters
+ ----------
+ tmp_path : pathlib.Path
+ Fixture to the temporary folder.
+ """
+ # Create invalid data frame.
+ invalid_df = pd.DataFrame(columns=["Id", "HeliostatId", "WrongCol"])
+
+ # Expect a ValueError.
+ with pytest.raises(ValueError, match="incorrect schema"):
+ PaintCalibrationDataset.from_benchmark(
+ benchmark_file=invalid_df, root_dir=tmp_path, item_type="raw_image"
+ )
diff --git a/tests/test_package.py b/tests/test_package.py
new file mode 100644
index 0000000..fa309df
--- /dev/null
+++ b/tests/test_package.py
@@ -0,0 +1,33 @@
+import importlib
+import importlib.metadata
+from importlib.metadata import PackageNotFoundError
+from unittest.mock import MagicMock
+
+import pytest
+
+import paint
+
+
+def test_version_fallback_when_package_missing(monkeypatch: pytest.MonkeyPatch) -> None:
+ """
+ Verify that ``__version__`` falls back to '0.0.0' if the package is not installed.
+
+ This test mocks ``importlib.metadata.version`` to raise ``PackageNotFoundError``,
+ then reloads the module to trigger the except block.
+
+ Parameters
+ ----------
+ monkeypatch : pytest.MonkeyPatch
+ MonkeyPatch fixture.
+ """
+ # Create a mock that raises the specific error.
+ mock_raiser = MagicMock(side_effect=PackageNotFoundError)
+
+ # Apply the mock to the standard library function.
+ monkeypatch.setattr(importlib.metadata, "version", mock_raiser)
+
+ # Reload the module to force the top-level try/except block to run again.
+ importlib.reload(paint)
+
+ # Assert the fallback behavior.
+ assert paint.__version__ == "0.0.0"
diff --git a/tutorials/paint_data_tutorial.ipynb b/tutorials/paint_data_tutorial.ipynb
new file mode 100644
index 0000000..20af578
--- /dev/null
+++ b/tutorials/paint_data_tutorial.ipynb
@@ -0,0 +1,1909 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "23d8c4e94a4b55f4",
+ "metadata": {},
+ "source": [
+ "# ``PAINT`` Data Tutorial\n",
+ "\n",
+ "This interactive notebook provides a brief overview of the ``PAINT`` database, demonstrating how to:\n",
+ "- Initialize the STAC client.\n",
+ "- Download and inspect metadata.\n",
+ "- Generate calibration data splits.\n",
+ "- Load calibration data using a dataloader.\n",
+ "- Download and inspect other types of ``PAINT`` data.\n",
+ "\n",
+ "> **Note:** Python executable scripts for each step are available in the ``scripts`` folder of the [PAINT GitHub](https://github.com/ARTIST-Association/PAINT/tree/main/scripts). We recommend using those scripts if you plan to download and process large amounts of ``PAINT`` data."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3e82d5e92da63968",
+ "metadata": {},
+ "source": [
+ "## Getting Started\n",
+ "\n",
+ "To run this tutorial, ensure you have the ``PAINT`` tutorial dependencies installed:\n",
+ "```\n",
+ "pip install \"paint-csp[tutorial]\"\n",
+ "```\n",
+ "To verify the installation, let's import ``PAINT`` and check the version attribute:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "id": "initial_id",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2026-01-30T11:12:26.352122Z",
+ "start_time": "2026-01-30T11:12:26.344571Z"
+ }
+ },
+ "source": [
+ "import paint\n",
+ "\n",
+ "print(f\"``PAINT`` is running with version: {paint.__version__}\")"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "``PAINT`` is running with version: 2.0.0\n"
+ ]
+ }
+ ],
+ "execution_count": 1
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b2a2d5be158a05b9",
+ "metadata": {},
+ "source": [
+ "We also need to specify a directory where all downloaded data will be saved. **Update the file path below to a location that works for your system:**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "id": "ee4635a10ae20007",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2026-01-30T11:12:26.361699Z",
+ "start_time": "2026-01-30T11:12:26.359796Z"
+ }
+ },
+ "source": [
+ "from pathlib import Path\n",
+ "\n",
+ "download_path = Path(\"./PAINT_tutorial_data\")"
+ ],
+ "outputs": [],
+ "execution_count": 2
+ },
+ {
+ "cell_type": "markdown",
+ "id": "57e39a37b0d41383",
+ "metadata": {},
+ "source": [
+ "## Downloading Metadata\n",
+ "\n",
+ "Before working with the actual ``PAINT`` data, we will inspect the metadata to understand what is available. For this tutorial, we will focus on a small subset of heliostats: those with IDs starting with \"AA\". This includes the range from **AA23 to AA51**.\n",
+ "\n",
+ "In the next step, we will:\n",
+ "- Generate a list of heliostats to access.\n",
+ "- Create a STAC client.\n",
+ "- Download the metadata and save it to the specified location."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "id": "95276b7d0af455cc",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2026-01-30T11:13:16.335654Z",
+ "start_time": "2026-01-30T11:12:26.445975Z"
+ }
+ },
+ "source": [
+ "# Import the STAC client.\n",
+ "from paint.data import StacClient\n",
+ "\n",
+ "# Generate heliostat list.\n",
+ "heliostat_list = [f\"AA{i}\" for i in range(23, 52)]\n",
+ "\n",
+ "# Initialize STAC client.\n",
+ "client = StacClient(output_dir=download_path)\n",
+ "\n",
+ "# Download metadata.\n",
+ "client.get_heliostat_metadata(heliostats=heliostat_list)"
+ ],
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "No collections selected - downloading data for all collections!\n",
+ "Processing Heliostat Catalogs: 0%| | 0/29 [00:00, ? catalog/s]The child with ID AA40-deflectometry-collection is not available, data for this child cannot be accessed.\n",
+ "Processing Heliostat Catalogs: 3%|▎ | 1/29 [00:01<00:29, 1.06s/ catalog]The child with ID AA37-deflectometry-collection is not available, data for this child cannot be accessed.\n",
+ "Processing Heliostat Catalogs: 48%|████▊ | 14/29 [00:01<00:00, 15.12 catalog/s]The child with ID AA42-deflectometry-collection is not available, data for this child cannot be accessed.\n",
+ "The child with ID AA41-deflectometry-collection is not available, data for this child cannot be accessed.\n",
+ "Processing Heliostat Catalogs: 62%|██████▏ | 18/29 [00:01<00:00, 12.40 catalog/s]The child with ID AA45-deflectometry-collection is not available, data for this child cannot be accessed.\n",
+ "The child with ID AA48-deflectometry-collection is not available, data for this child cannot be accessed.\n",
+ "The child with ID AA43-deflectometry-collection is not available, data for this child cannot be accessed.\n",
+ "The child with ID AA47-deflectometry-collection is not available, data for this child cannot be accessed.\n",
+ "Processing Heliostat Catalogs: 90%|████████▉ | 26/29 [00:01<00:00, 19.56 catalog/s]The child with ID AA46-deflectometry-collection is not available, data for this child cannot be accessed.\n",
+ "The child with ID AA51-deflectometry-collection is not available, data for this child cannot be accessed.\n",
+ "Processing Heliostat Catalogs: 100%|██████████| 29/29 [00:02<00:00, 11.34 catalog/s]\n",
+ "Processing Heliostat Catalogs: 0%| | 0/29 [00:00, ? catalog/s]The child with ID AA40-calibration-collection is not available, data for this child cannot be accessed.\n",
+ "Processing Heliostat Catalogs: 24%|██▍ | 7/29 [00:19<00:46, 2.12s/ catalog]The child with ID AA48-calibration-collection is not available, data for this child cannot be accessed.\n",
+ "Processing Heliostat Catalogs: 100%|██████████| 29/29 [00:43<00:00, 1.49s/ catalog]\n",
+ "Processing Heliostat Catalogs: 100%|██████████| 29/29 [00:00<00:00, 68.73 catalog/s]\n"
+ ]
+ }
+ ],
+ "execution_count": 3
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5a51fdec9b7be22f",
+ "metadata": {},
+ "source": [
+ "Notice the following details in the output above:\n",
+ "1. **\"No collections selected\" Log:** This appears because we did not use the ``collections`` argument. Consequently, metadata for *all* available collections, i.e., properties, calibration, and deflectometry, was downloaded for the selected heliostats.\n",
+ "2. **Data Availability Warnings:** These are expected. While properties data exists for all heliostats, deflectometry and calibration data are only available for a subset of them."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a6d0d54359f5377c",
+ "metadata": {},
+ "source": [
+ "## Inspecting the Metadata\n",
+ "\n",
+ "Let's load the metadata to inspect it. In your download folder, you should now see a ``metadata`` directory containing three CSV files:\n",
+ "- Calibration metadata\n",
+ "- Deflectometry metadata\n",
+ "- Properties metadata\n",
+ "\n",
+ "These filenames are automatically appended with ``selected_heliostats`` and a timestamp to uniquely identify them.\n",
+ "\n",
+ "In the next step, we will load these files. Note that this tutorial assumes your download folder contains exactly one of each metadata file type. If you have run previous downloads, ensure only the relevant files are present, or the code below may fail."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "id": "4521b907024514f5",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2026-01-30T11:13:16.358132Z",
+ "start_time": "2026-01-30T11:13:16.345354Z"
+ }
+ },
+ "source": [
+ "import pandas as pd\n",
+ "\n",
+ "# Load file names.\n",
+ "calibration_metadata_file = list(\n",
+ " Path(download_path / \"metadata\").glob(\"calibration*.csv\")\n",
+ ")\n",
+ "deflectometry_metadata_file = list(\n",
+ " Path(download_path / \"metadata\").glob(\"deflectometry*.csv\")\n",
+ ")\n",
+ "properties_metadata_file = list(\n",
+ " Path(download_path / \"metadata\").glob(\"properties*.csv\")\n",
+ ")\n",
+ "\n",
+ "# Test to make sure the folder structure is as expected.\n",
+ "if (\n",
+ " len(calibration_metadata_file)\n",
+ " == len(deflectometry_metadata_file)\n",
+ " == len(properties_metadata_file)\n",
+ " == 1\n",
+ "):\n",
+ " calibration_metadata_file = calibration_metadata_file[0]\n",
+ " deflectometry_metadata_file = deflectometry_metadata_file[0]\n",
+ " properties_metadata_file = properties_metadata_file[0]\n",
+ "else:\n",
+ " print(\n",
+ " \"Incorrect metadata structure - this tutorial is designed specifically for a metadata folder containing \"\n",
+ " \"a single calibration metadata, deflectometry metadata, and properties metadata file. If multiple files \"\n",
+ " \"are present or one of these are missing the tutorial will not run as desired. Please check you ran the \"\n",
+ " \"steps above correctly!\"\n",
+ " )\n",
+ "\n",
+ "# Open the metadata files.\n",
+ "calibration_metadata = pd.read_csv(calibration_metadata_file)\n",
+ "deflectometry_metadata = pd.read_csv(deflectometry_metadata_file)\n",
+ "properties_metadata = pd.read_csv(properties_metadata_file)\n",
+ "\n",
+ "# Inspect the properties metadata.\n",
+ "print(\n",
+ " f\"The properties metadata file contains {len(properties_metadata)} rows and {len(properties_metadata.columns)} columns.\\n\"\n",
+ " f\"The columns are: {', '.join(properties_metadata.columns)}\"\n",
+ ")"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The properties metadata file contains 29 rows and 6 columns.\n",
+ "The columns are: Id, HeliostatId, latitude, longitude, Elevation, DateTime\n"
+ ]
+ }
+ ],
+ "execution_count": 4
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5087163feca2f064",
+ "metadata": {},
+ "source": [
+ "The properties metadata contains information on the 29 heliostats we are considering:\n",
+ "- **Id:** The ID of the STAC file containing the properties information\n",
+ "- **HeliostatId:** The specific ID of the heliostat\n",
+ "- **latitude:** The latitude of the heliostat\n",
+ "- **longitude:** The longitude of the heliostat\n",
+ "- **Elevation:** The elevation of the heliostat\n",
+ "- **DateTime:** The timestamp of the measurement"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "id": "f4fa6986450453f6",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2026-01-30T11:13:16.382424Z",
+ "start_time": "2026-01-30T11:13:16.376010Z"
+ }
+ },
+ "source": [
+ "properties_metadata.head()"
+ ],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ " Id HeliostatId latitude longitude Elevation \\\n",
+ "0 AA23-heliostat-properties AA23 50.913647 6.387012 88.590057 \n",
+ "1 AA24-heliostat-properties AA24 50.913646 6.387075 88.599808 \n",
+ "2 AA25-heliostat-properties AA25 50.913646 6.387138 88.620598 \n",
+ "3 AA26-heliostat-properties AA26 50.913646 6.387200 88.603058 \n",
+ "4 AA27-heliostat-properties AA27 50.913646 6.387263 88.615654 \n",
+ "\n",
+ " DateTime \n",
+ "0 2021-07-20 05:09:00+00:00 \n",
+ "1 2021-07-20 05:09:00+00:00 \n",
+ "2 2021-07-20 05:09:00+00:00 \n",
+ "3 2021-07-20 05:09:00+00:00 \n",
+ "4 2021-07-20 05:09:00+00:00 "
+ ],
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
Id
\n",
+ "
HeliostatId
\n",
+ "
latitude
\n",
+ "
longitude
\n",
+ "
Elevation
\n",
+ "
DateTime
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
AA23-heliostat-properties
\n",
+ "
AA23
\n",
+ "
50.913647
\n",
+ "
6.387012
\n",
+ "
88.590057
\n",
+ "
2021-07-20 05:09:00+00:00
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
AA24-heliostat-properties
\n",
+ "
AA24
\n",
+ "
50.913646
\n",
+ "
6.387075
\n",
+ "
88.599808
\n",
+ "
2021-07-20 05:09:00+00:00
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
AA25-heliostat-properties
\n",
+ "
AA25
\n",
+ "
50.913646
\n",
+ "
6.387138
\n",
+ "
88.620598
\n",
+ "
2021-07-20 05:09:00+00:00
\n",
+ "
\n",
+ "
\n",
+ "
3
\n",
+ "
AA26-heliostat-properties
\n",
+ "
AA26
\n",
+ "
50.913646
\n",
+ "
6.387200
\n",
+ "
88.603058
\n",
+ "
2021-07-20 05:09:00+00:00
\n",
+ "
\n",
+ "
\n",
+ "
4
\n",
+ "
AA27-heliostat-properties
\n",
+ "
AA27
\n",
+ "
50.913646
\n",
+ "
6.387263
\n",
+ "
88.615654
\n",
+ "
2021-07-20 05:09:00+00:00
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "execution_count": 5
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3b7c1e4028dc1cf9",
+ "metadata": {},
+ "source": [
+ "Above we can see the first five rows of this metadata table. Now lets look at the calibration metadata:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "id": "e6ef7037a3e832e0",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2026-01-30T11:13:16.426869Z",
+ "start_time": "2026-01-30T11:13:16.425059Z"
+ }
+ },
+ "source": [
+ "# Inspect the calibration metadata.\n",
+ "print(\n",
+ " f\"The calibration metadata file contains {len(calibration_metadata)} rows and {len(calibration_metadata.columns)} columns.\\n\"\n",
+ " f\"The columns are: {', '.join(calibration_metadata.columns)}\"\n",
+ ")"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The calibration metadata file contains 4691 rows and 17 columns.\n",
+ "The columns are: Id, HeliostatId, Azimuth, Elevation, lower_left_latitude, lower_left_longitude, lower_left_Elevation, upper_left_latitude, upper_left_longitude, upper_left_Elevation, upper_right_latitude, upper_right_longitude, upper_right_Elevation, lower_right_latitude, lower_right_longitude, lower_right_Elevation, DateTime\n"
+ ]
+ }
+ ],
+ "execution_count": 6
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e21629a42491489b",
+ "metadata": {},
+ "source": [
+ "This dataframe contains significantly more rows because there are often multiple calibration measurements for each heliostat. The columns include:\n",
+ "- **Id:** The measurement ID of the calibration measurement\n",
+ "- **HeliostatId:** The ID of the heliostat used for this measurement\n",
+ "- **Azimuth:** The sun's azimuth at the time of measurement\n",
+ "- **Elevation:** The sun's elevation at the time of measurement\n",
+ "- **Target Coordinates:** The latitude, longitude, and elevation for the *lower_left*, *upper_left*, *upper_right*, and *lower_right* corners of the calibration target\n",
+ "- **DateTime:** The timestamp of the measurement\n",
+ "\n",
+ "The first five rows are displayed below:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "id": "cdf7447636c43830",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2026-01-30T11:13:16.561059Z",
+ "start_time": "2026-01-30T11:13:16.555687Z"
+ }
+ },
+ "source": [
+ "calibration_metadata.head()"
+ ],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ " Id HeliostatId Azimuth Elevation lower_left_latitude \\\n",
+ "0 225295 AA23 81.839158 37.047879 50.913396 \n",
+ "1 199617 AA23 -24.275629 48.834090 50.913396 \n",
+ "2 62302 AA23 -42.017068 8.527271 50.913396 \n",
+ "3 222963 AA23 -6.400352 62.327916 50.913392 \n",
+ "4 212358 AA23 66.411607 45.213617 50.913392 \n",
+ "\n",
+ " lower_left_longitude lower_left_Elevation upper_left_latitude \\\n",
+ "0 6.387613 135.789 50.913396 \n",
+ "1 6.387613 135.789 50.913396 \n",
+ "2 6.387613 135.789 50.913396 \n",
+ "3 6.387886 119.268 50.913392 \n",
+ "4 6.387886 119.268 50.913392 \n",
+ "\n",
+ " upper_left_longitude upper_left_Elevation upper_right_latitude \\\n",
+ "0 6.387613 142.175 50.913397 \n",
+ "1 6.387613 142.175 50.913397 \n",
+ "2 6.387613 142.175 50.913397 \n",
+ "3 6.387886 126.470 50.913392 \n",
+ "4 6.387886 126.470 50.913392 \n",
+ "\n",
+ " upper_right_longitude upper_right_Elevation lower_right_latitude \\\n",
+ "0 6.387536 142.172 50.913397 \n",
+ "1 6.387536 142.172 50.913397 \n",
+ "2 6.387536 142.172 50.913397 \n",
+ "3 6.387763 126.506 50.913392 \n",
+ "4 6.387763 126.506 50.913392 \n",
+ "\n",
+ " lower_right_longitude lower_right_Elevation DateTime \n",
+ "0 6.387536 135.783 2023-06-27 05:39:56+00:00 \n",
+ "1 6.387536 135.783 2023-04-21 10:37:26+00:00 \n",
+ "2 6.387536 135.783 2022-01-18 13:44:45+00:00 \n",
+ "3 6.387763 119.279 2023-06-16 09:48:04+00:00 \n",
+ "4 6.387763 119.279 2023-05-31 06:35:41+00:00 "
+ ],
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
Id
\n",
+ "
HeliostatId
\n",
+ "
Azimuth
\n",
+ "
Elevation
\n",
+ "
lower_left_latitude
\n",
+ "
lower_left_longitude
\n",
+ "
lower_left_Elevation
\n",
+ "
upper_left_latitude
\n",
+ "
upper_left_longitude
\n",
+ "
upper_left_Elevation
\n",
+ "
upper_right_latitude
\n",
+ "
upper_right_longitude
\n",
+ "
upper_right_Elevation
\n",
+ "
lower_right_latitude
\n",
+ "
lower_right_longitude
\n",
+ "
lower_right_Elevation
\n",
+ "
DateTime
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
225295
\n",
+ "
AA23
\n",
+ "
81.839158
\n",
+ "
37.047879
\n",
+ "
50.913396
\n",
+ "
6.387613
\n",
+ "
135.789
\n",
+ "
50.913396
\n",
+ "
6.387613
\n",
+ "
142.175
\n",
+ "
50.913397
\n",
+ "
6.387536
\n",
+ "
142.172
\n",
+ "
50.913397
\n",
+ "
6.387536
\n",
+ "
135.783
\n",
+ "
2023-06-27 05:39:56+00:00
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
199617
\n",
+ "
AA23
\n",
+ "
-24.275629
\n",
+ "
48.834090
\n",
+ "
50.913396
\n",
+ "
6.387613
\n",
+ "
135.789
\n",
+ "
50.913396
\n",
+ "
6.387613
\n",
+ "
142.175
\n",
+ "
50.913397
\n",
+ "
6.387536
\n",
+ "
142.172
\n",
+ "
50.913397
\n",
+ "
6.387536
\n",
+ "
135.783
\n",
+ "
2023-04-21 10:37:26+00:00
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
62302
\n",
+ "
AA23
\n",
+ "
-42.017068
\n",
+ "
8.527271
\n",
+ "
50.913396
\n",
+ "
6.387613
\n",
+ "
135.789
\n",
+ "
50.913396
\n",
+ "
6.387613
\n",
+ "
142.175
\n",
+ "
50.913397
\n",
+ "
6.387536
\n",
+ "
142.172
\n",
+ "
50.913397
\n",
+ "
6.387536
\n",
+ "
135.783
\n",
+ "
2022-01-18 13:44:45+00:00
\n",
+ "
\n",
+ "
\n",
+ "
3
\n",
+ "
222963
\n",
+ "
AA23
\n",
+ "
-6.400352
\n",
+ "
62.327916
\n",
+ "
50.913392
\n",
+ "
6.387886
\n",
+ "
119.268
\n",
+ "
50.913392
\n",
+ "
6.387886
\n",
+ "
126.470
\n",
+ "
50.913392
\n",
+ "
6.387763
\n",
+ "
126.506
\n",
+ "
50.913392
\n",
+ "
6.387763
\n",
+ "
119.279
\n",
+ "
2023-06-16 09:48:04+00:00
\n",
+ "
\n",
+ "
\n",
+ "
4
\n",
+ "
212358
\n",
+ "
AA23
\n",
+ "
66.411607
\n",
+ "
45.213617
\n",
+ "
50.913392
\n",
+ "
6.387886
\n",
+ "
119.268
\n",
+ "
50.913392
\n",
+ "
6.387886
\n",
+ "
126.470
\n",
+ "
50.913392
\n",
+ "
6.387763
\n",
+ "
126.506
\n",
+ "
50.913392
\n",
+ "
6.387763
\n",
+ "
119.279
\n",
+ "
2023-05-31 06:35:41+00:00
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "execution_count": 7
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e00b86252619e38e",
+ "metadata": {},
+ "source": [
+ "Now finally it is time to inspect the deflectometry metadata:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "id": "55674ae61e53e85",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2026-01-30T11:13:16.621784Z",
+ "start_time": "2026-01-30T11:13:16.620017Z"
+ }
+ },
+ "source": [
+ "# Inspect the deflectometry metadata.\n",
+ "print(\n",
+ " f\"The deflectometry metadata file contains {len(deflectometry_metadata)} rows and {len(deflectometry_metadata.columns)} columns.\\n\"\n",
+ " f\"The columns are: {', '.join(deflectometry_metadata.columns)}\"\n",
+ ")"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The deflectometry metadata file contains 39 rows and 6 columns.\n",
+ "The columns are: Id, HeliostatId, latitude, longitude, Elevation, DateTime\n"
+ ]
+ }
+ ],
+ "execution_count": 8
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9d8b2f0fd819e366",
+ "metadata": {},
+ "source": [
+ "Again, we see more rows than the number of heliostats because some heliostats contain multiple deflectometry measurements. The columns are nearly identical to the properties metadata, with one key difference: the **Id** column refers to the *deflectometry STAC ID*, not the properties ID.\n",
+ "\n",
+ "The first five rows are displayed below:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "id": "f4a270d14e53662a",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2026-01-30T11:13:16.679975Z",
+ "start_time": "2026-01-30T11:13:16.676206Z"
+ }
+ },
+ "source": [
+ "deflectometry_metadata.head()"
+ ],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ " Id HeliostatId latitude longitude \\\n",
+ "0 AA23-2021-10-13Z09-27-07Z-deflectometry AA23 50.913647 6.387012 \n",
+ "1 AA24-2021-10-13Z09-29-29Z-deflectometry AA24 50.913646 6.387075 \n",
+ "2 AA25-2021-10-13Z09-32-36Z-deflectometry AA25 50.913646 6.387138 \n",
+ "3 AA26-2021-10-13Z09-34-21Z-deflectometry AA26 50.913646 6.387200 \n",
+ "4 AA27-2021-10-12Z13-27-32Z-deflectometry AA27 50.913646 6.387263 \n",
+ "\n",
+ " Elevation DateTime \n",
+ "0 88.590057 2021-10-13 09:27:07+00:00 \n",
+ "1 88.599808 2021-10-13 09:29:29+00:00 \n",
+ "2 88.620598 2021-10-13 09:32:36+00:00 \n",
+ "3 88.603058 2021-10-13 09:34:21+00:00 \n",
+ "4 88.615654 2021-10-12 13:27:32+00:00 "
+ ],
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
Id
\n",
+ "
HeliostatId
\n",
+ "
latitude
\n",
+ "
longitude
\n",
+ "
Elevation
\n",
+ "
DateTime
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
AA23-2021-10-13Z09-27-07Z-deflectometry
\n",
+ "
AA23
\n",
+ "
50.913647
\n",
+ "
6.387012
\n",
+ "
88.590057
\n",
+ "
2021-10-13 09:27:07+00:00
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
AA24-2021-10-13Z09-29-29Z-deflectometry
\n",
+ "
AA24
\n",
+ "
50.913646
\n",
+ "
6.387075
\n",
+ "
88.599808
\n",
+ "
2021-10-13 09:29:29+00:00
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
AA25-2021-10-13Z09-32-36Z-deflectometry
\n",
+ "
AA25
\n",
+ "
50.913646
\n",
+ "
6.387138
\n",
+ "
88.620598
\n",
+ "
2021-10-13 09:32:36+00:00
\n",
+ "
\n",
+ "
\n",
+ "
3
\n",
+ "
AA26-2021-10-13Z09-34-21Z-deflectometry
\n",
+ "
AA26
\n",
+ "
50.913646
\n",
+ "
6.387200
\n",
+ "
88.603058
\n",
+ "
2021-10-13 09:34:21+00:00
\n",
+ "
\n",
+ "
\n",
+ "
4
\n",
+ "
AA27-2021-10-12Z13-27-32Z-deflectometry
\n",
+ "
AA27
\n",
+ "
50.913646
\n",
+ "
6.387263
\n",
+ "
88.615654
\n",
+ "
2021-10-12 13:27:32+00:00
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "execution_count": 9
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cb071d62c5c816ea",
+ "metadata": {},
+ "source": [
+ "Since we will be using the calibration dataset later, let's inspect it in more detail. Specifically, we look at how many of our heliostats have calibration measurements and how the number of calibration measurements varies across the heliostats:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "id": "ae5fafd83c0957e0",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2026-01-30T11:13:16.727793Z",
+ "start_time": "2026-01-30T11:13:16.722279Z"
+ }
+ },
+ "source": [
+ "from IPython.display import HTML, display\n",
+ "\n",
+ "# Calculate counts once.\n",
+ "counts = calibration_metadata[\"HeliostatId\"].value_counts()\n",
+ "unique_heliostats = calibration_metadata[\"HeliostatId\"].nunique()\n",
+ "\n",
+ "# Create DataFrames for better rendering.\n",
+ "top_5 = counts.head(5).to_frame(name=\"Measurement Count\")\n",
+ "bottom_5 = counts.tail(5).to_frame(name=\"Measurement Count\")\n",
+ "\n",
+ "display(\n",
+ " HTML(f\"\"\"\n",
+ "