CrocoLake-Python is a collection of Jupyter notebooks that shows how to interface with CrocoLake and Argo's parquet databases with Python.
Create your local environment (creates a folder in your current directory)
virtualenv crocolake
Activate the new python environment
source crocolake/bin/activate
Install the required packages
pip install .
Launch jupyter lab to access the notebooks:
jupyter lab
If you don't have virtualenv in your machine, you can install it with
pip install virtualenv
Create the local environment
conda env create -f environment.yml
Activate the environment:
conda activate crocolake
Launch jupyter lab to access the notebooks:
jupyter lab
You can then launch any notebook from the notebooks folder and execute it. Each example needs a specific dataset, and it contains code to download it to your local machine.
Note that there are a couple of ways to load parquet datasets in a dataframe in Python: using pyarrow and using dask. Example 1 and Example 2 show both, while the other examples use the one that in my experience is most efficient (i.e. dask).
The notebooks folder contains four examples/tutorials:
- Example 1 shows how to make a map of dissolved oxygen content in the North West Atlantic;
- Example 2 shows how to make a map of temperature measurements in the North West Atlantic, including information about the source (Argo, GLODAP, or Spray Gliders);
- Example 3 shows how to make temperature-salinity plots from Argo QC-ed measurements;
- Example 4 shows how to make an animation of Argo's fleet growth over time on a world map;
- Example 5 shows how to make a map of dissolved oxygen measurements in the Pacific off the coast of California.
The following databases are currently available:
- CrocoLake: contains the best available data from Argo, GLODAP, and Spray Gliders. More details here. This example uses CrocoLake.
- Argo 'QC': contains the best available data, that is real time values are reported only when delayed values are not available. This version is the same used in CrocoLake, and here you can find more details on how it is generated. This example uses Argo 'QC'.
- Argo 'ALL': contains all real time and adjusted variables as reported in the core ('<PLATFORM_NUMBER>_prof.nc') and synthetic ('<PLATFORM_NUMBER>_Sprof.nc') profile files, for the physical and biogeochemical versions respectively. Both this and this examples use Argo 'ALL'.
Each database comes in 'PHY' and 'BGC' versions.
For any questions, bugs, missing information, etc, open an issue or get in touch!