Machine learning (ML) techniques are becoming increasingly vital for land cover classification, particularly in mapping snow-covered areas. However, it is important that the land cover maps produced by these methods be validated against ground truth and compared with similar datasets to understand the uncertainties between models, products derived from remote sensing, and specific land surfaces. To this end, in this project, we are interested in understanding how well we can map snow-covered areas using the remote sensing Harmonized Landsat Sentinel-2 (HLS, Claverie et al., 2018) product at 10-m spatial resolution. We hypothesize that the ML methods are likely to perform better than the standard index-based snow mapping method. This repository focuses on evaluating the effectiveness of ML methods compared to traditional index-based methods for snow mapping using the Harmonized Landsat Sentinel-2(HLS) product at 10-m spatial resolution.
- Access and visualize airborne and satellite imagery (HLS).
- Derive preliminary snow maps using index-based methods.
- Create a random forest model tailored to HLS datasets and evaluate its performance.
- Explore feature importance and model transferability, including tests on independent datasets.
- Derive snow-covered areas using existing models, including Google Dynamic World and NASA-IBM geospatial model.
Environment
- Cryocloud: Built-in environment to access and manipulate data.
Tutorials
- 2021 Cloud Hackathon: Tutorial on data access helps to build baseline model codes. We primarily used tutorials 2 (to collect relevant links), 4 (to set up NASA Earthdata access), and 5 (extracting .tif files from links collected in tutorial 2).
- Random Forest Walkthrough: A walkthrough on data formatting and general guidelines for creating a random forest model. Also used the provided function library here
HLS Data Access
- Sentinel Hub: Finding an image date/time/location with snow in a place.
- Most recent ASO datasets: Packages for most recent ASO data.
- Landsat and Sentinel band reference table: The table for HLS Spectral Bands we used to compare and cauculate the NDSI. And we calculated NDSI followed by (https://www.usgs.gov/landsat-missions/normalized-difference-snow-index).
- Tiling System: The naming convention used in some parts is based off of the ESA's tiling system. This links to a high resolution image of the system.
Lidar Data
- Earthdata Search: We used 50m resolution lidar data found via Earthdata Search as our truth values due to it being very precise and accurate. See here: for a tutorial on how to use the tool.
- ASO Alternatively, we can look in here for specific regions of lidar data.
To match the development environment, access cyrocloud.
If you are not able to access cryocloud, then look at the provided [[environment.yml]] file extracted from cryocloud.
From the same directory, run conda env create --name envname --file=environment.yml (replacing envname) to generate a conda environment. This will have all libraries at the appropriate versions that we used in this project.
We've generated model 0 based on "ground truth" data, derived from the HLS NDSI product using a threshold of 0.4. To get model 0, please follow the steps on notebooks:
- DataDiscovery.ipynb
- EarthdataLoginSetup.ipynb
- SnowIndexing.ipynb
Validation framework on comparing Ml-derived datasets with ground truth measurements. To run Random forest model: 1.RF_data_preparation.ipynb 2.RF_Model.ipynb