The Ocean Data Platform (ODP) is a hosted catalog of curated marine and environmental datasets. This package provides light-weight R bindings so you can authenticate with your HubOcean account, navigate to a dataset, pick a table, and stream rows straight into data frames or Arrow tables without leaving your analysis workflow. The SDK supports streaming queries, server-side aggregations, and raw file management (upload, download, ingest).
When you work with the SDK you will usually touch the following pieces:
odp_client()— holds your API key and issues authenticated requests- dataset object — retrieved via
client$dataset("<dataset-id>") - table object — accessed via
dataset$table - cursor — returned from
table$select()and responsible for paging data - files — accessed via
dataset$filesfor raw file upload/download/ingest
The sections below walk through that flow so anyone landing here (including via
?odp) quickly sees how to get from credentials to a usable tibble.
Status: This sdk is still considered pre-release. We are looking for feedback, so please reach out if you have any issues, concerns or other ideas that you think can improve your experience using this sdk.
- R 4.1 or newer
- Packages declared in
DESCRIPTION(install withpak,renv, orinstall.packages()) - Authentication: either an API key or an interactive browser session (see Authentication below)
Get your API key from My Account in the web app and pass it to the client:
client <- odp_client(api_key = "your-api-key")Or set the ODP_API_KEY environment variable and skip the argument:
Sys.setenv(ODP_API_KEY = "your-api-key")
client <- odp_client()When no API key is available in an interactive R session, the SDK will automatically open your browser for authentication via HubOcean's login page. The interactive flow uses OAuth2 with Azure B2C and caches tokens locally, so you won't need to re-authenticate on every session.
The snippet below shows the full flow: install, authenticate, navigate to a dataset, pick a table, and stream the columns you care about. Swap the dataset ID for the resources you have access to in the ODP catalog.
# install straight from GitHub (requires remotes, pak, or devtools)
install.packages("remotes") # skip if already installed
remotes::install_github("C4IROcean/odp-sdkr")
# local checkout? make sure vignettes are built
# remotes::install_local("~/dev/odp_sdkr", build = TRUE, build_vignettes = TRUE)
library(odp)
# 1. Client (API key can come from ODP_API_KEY)
client <- odp_client(api_key = "Sk_....")
# 2. Dataset (see https://app.hubocean.earth/)
dataset <- client$dataset("aea06582-fc49-4995-a9a8-2f31fcc65424")
# 3. Table (defaults to the first table in the dataset)
table <- dataset$table
# 4. Query – returns a cursor that streams rows lazily
cursor <- table$select(
filter = "depth > $min_depth",
vars = list(min_depth = 300),
columns = c("latitude", "longitude", "depth"),
timeout = 15
)
# 5. Fetch table into a dataframe that you can use for analysis
df <- cursor$dataframe()The hosted documentation at https://docs.hubocean.earth/r_sdk/ is the canonical place to learn more about authentication, cursors, batching, and advanced patterns. Install the package locally and lean on the official docs when you need deeper explanations or diagrams.
help(package = "odp")gives a quick index of the exported helpers
When working with a large table it can be helpful to fetch the table in batches, to do this you can use the next_batch helper to iterate over the batches one by one. The cursor will fetch the pages in chunks in the background when you need them
cursor <- table$select()
while (!is.null(chunk <- cursor$next_batch())) {
print(chunk$num_rows)
}
# Convert on demand
df <- cursor$dataframe()
arrow_tbl <- cursor$arrow()
# tibble support is optional
# tib_tbl <- cursor$tibble()
collect()/dataframe()/tibble()/arrow()only materialise batches that have not been streamed yet. To obtain the full dataset after callingnext_batch(), create a fresh cursor and collect before iterating.
The sdk supports server side aggregations. This can be useful if you want to compute simple statistics without transfering all of the table data
agg <- table$aggregate(
group_by = "'TOTAL'",
filter = "depth > 200",
aggr = list(depth = "mean")
)
print(agg)Pass an
aggrnamed list where each entry specifies how the column should be aggregated ("sum","min","max","count","mean").
Datasets can also hold raw files. The $files helper (an alias for
$table$raw) lets you upload, list, download, ingest and delete files.
# Upload a file (accepts a character string or raw vector)
file_id <- dataset$files$upload("measurements.csv", "lat,lon,depth\n59.5,5.3,120\n")
# List files attached to the dataset
dataset$files$list()
# Download returns a raw vector
content <- dataset$files$download(file_id)
cat(rawToChar(content))
# Ingest the file into the table (modes: "append", "truncate", "drop")
dataset$files$ingest(file_id)
# Delete the file
dataset$files$delete(file_id)schema <- table$schema()
str(schema)
stats <- table$stats()
str(stats)tibble(only if you wantcursor$tibble())
Install optional packages as needed, for example: install.packages("tibble").
-
Install the package dependencies declared in
DESCRIPTIONand keep a recent version ofdevtools/pkgloadaround for running checks. -
Run the unit tests with
R -q -e "devtools::test()"and the fulldevtools::check()suite locally before opening a pull request. Tests use small synthetic Arrow streams, so they never call the live API. -
The repo ships a
.pre-commit-config.yamlthat runslintrandstylerthrough the helper scripts inscripts/. Install pre-commit once per machine and enable the hooks withpre-commit installto get the same linting enforced in CI. -
To lint/format everything manually (matching CI), run:
Rscript --vanilla scripts/precommit_lintr.R $(git ls-files -- '*.R' '*.r' '*.Rmd' '*.rmd') Rscript --vanilla scripts/precommit_styler.R $(git ls-files -- '*.R' '*.r' '*.Rmd' '*.rmd')
-
GitHub Actions keeps parity with the local tooling:
.github/workflows/lint-format-test.ymlruns the linters, formatting check, and the package'stestthatsuite (devtools::test()) on every push/PR. -
Build-able vignettes (
vignette("odp"),vignette("odp-tabular")) ship with the repo; install withbuild_vignettes = TRUEif you want those walkthroughs available offline.