GitHub - boom-lab/crocolake-python

CrocoLake-Python

CrocoLake-Python is a collection of Jupyter notebooks that shows how to interface with CrocoLake and Argo's parquet databases with Python.

Usage

python

Create your local environment (creates a folder in your current directory) virtualenv crocolake

Activate the new python environment source crocolake/bin/activate

Install the required packages pip install .

Launch jupyter lab to access the notebooks: jupyter lab

If you don't have virtualenv in your machine, you can install it with pip install virtualenv

conda

Create the local environment conda env create -f environment.yml

Activate the environment: conda activate crocolake

Launch jupyter lab to access the notebooks: jupyter lab

Notebooks

You can then launch any notebook from the notebooks folder and execute it. Each example needs a specific dataset, and it contains code to download it to your local machine.

Note that there are a couple of ways to load parquet datasets in a dataframe in Python: using pyarrow and using dask. Example 1 and Example 2 show both, while the other examples use the one that in my experience is most efficient (i.e. dask).

Examples

The notebooks folder contains four examples/tutorials:

Example 1 shows how to make a map of dissolved oxygen content in the North West Atlantic;
Example 2 shows how to make a map of temperature measurements in the North West Atlantic, including information about the source (Argo, GLODAP, or Spray Gliders);
Example 3 shows how to make temperature-salinity plots from Argo QC-ed measurements;
Example 4 shows how to make an animation of Argo's fleet growth over time on a world map;
Example 5 shows how to make a map of dissolved oxygen measurements in the Pacific off the coast of California.

Databases

The following databases are currently available:

CrocoLake: contains the best available data from Argo, GLODAP, and Spray Gliders. More details here. This example uses CrocoLake.
Argo 'QC': contains the best available data, that is real time values are reported only when delayed values are not available. This version is the same used in CrocoLake, and here you can find more details on how it is generated. This example uses Argo 'QC'.
Argo 'ALL': contains all real time and adjusted variables as reported in the core ('<PLATFORM_NUMBER>_prof.nc') and synthetic ('<PLATFORM_NUMBER>_Sprof.nc') profile files, for the physical and biogeochemical versions respectively. Both this and this examples use Argo 'ALL'.

Each database comes in 'PHY' and 'BGC' versions.

Contact

For any questions, bugs, missing information, etc, open an issue or get in touch!

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CrocoLake-Python

Table of Contents

Usage

python

conda

Notebooks

Examples

Databases

Contact

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

boom-lab/crocolake-python

Folders and files

Latest commit

History

Repository files navigation

CrocoLake-Python

Table of Contents

Usage

python

conda

Notebooks

Examples

Databases

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages