-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Running To-do List
- add artis-duckdb-metadata/blob/develop/process_knb_to_duckdb.R to
exploreARTIS/develop-build-knb-duckdbbranch- function
.Rscripts need to live in./R/directory when developing an R package
- function
- Move all
roxygen2documentation decorators to the very top of the function script. It may pass checks and build when it is separated across the script, but lets stick to explected formatting conventions. - explicitly
#' @export process_knb_to_duckdbin roxygen2 header to signify the main function to export the the package NAMESPACE. - Reorganize script so helper functions are listed first. Think about maintaining an "executable timeline" from top to bottom
- call package dependencies in
roxygen2header instead of in script - add directory argument to
process_knb_to_duckdb()to replacedownload_dir <- "~/Downloads/artis_downloads" - We need the ability to generate a duckdb of "custom artis timeseires" HS_version/year pairings. Very few end users will need all HS_versions and years. Perhaps have two functions? Or an argument along the lines of "artis_custome_timeseries = TRUE"
Problem
Storing the ARTIS database on KNB requires the use of many .csv to ensure the data is preserved in a accessible and usable format. For our end users, a duckdb will provide many benefits. However, we want to make it as easy as possible to set up a duckdb for our end users without introducing even more technologies. They will already have a learning curve to querying and running analyses with duckdb (trying to minimize!). We want to make the uptake of our new duckdb distribution as smooth and reproducible as possible.
Solution
Call a single function and BAM, you have the power of the new ARTIS duckdb on your local computer!
Add an exploreARTIS function that standardizes and streamlines building the ARTIS duckdb pulling data directly from the ARTIS KNB record. Use @Anurag19101996's script as foundation to insert data in a standardized way into duckdb. Pulling data directly from KNB would also register every ARTIS data download through the built in KNB metric service!
Function arguments:
version: "latest" or "DOI" or "1.0.0"model_run?: "FAO" of "SAU" (not sure how we will separate these in KNB. might not need if new DOI assigned to different data versions)user_orcid: "https://orcid.org/0000-0002-9370-9128" (for signing into KNB)- other KNB credentials needed?
path: file path for duckdb file
Questions
- Use DataOne API to request data OR
rdataonepackage? - Is
rdataonejust a R client for the DataONE API? - How do we specify the KNB repository through the DataONE API and/or
rdataonepackage? - Is it possible to download/pull KNB data directly into duckdb without saving it locally first? Insert R function into SQL query passed to duckdb?
- Find the persistent DOI for the ARTIS data and the versioned DOIs
Relevant info and resources
The Knowledge Network for Biocomplexity (KNB) data repository is a member of the DataOne network of data repositories.
rdataonepackage https://github.com/DataONEorg/rdataone- DataONE API documentation https://releases.dataone.org/online/api-documentation-v2.0.1/apis/CN_APIs.html#CNRead.resolve
- Searching DataONE Data Holdings
- ARTIS data on KNB - https://knb.ecoinformatics.org/view/doi%3A10.5063%2FF1CZ35N7
- DOI - doi:10.5063/F1CZ35N7
- Guide to R package development
Ideas for function name
exploreARTIS::build_artis_with_ducks()🦆build_artis_duckdb()make_artis_duckdb()setup_artis_duckdb()
Metadata
Metadata
Assignees
Labels
Type
Projects
Status