Skip to content

Add to process_knb_to_duckdb(artis_custom_timeseries = TRUE) #29

@theamarks

Description

@theamarks

Background:

Quick and easily reproducible custom filtering that assigns a single hs_version to each year. Need to reduce data transfer size from KNB to local storage to duckdb. The next release of ARTIS data (v1.1.0) the consumption tables are substantially larger. In order for our KNB to duckdb solution to work for the internal team and end users, we need to be efficient with file volumes.

For analyses purposes, we often filter ARTIS data to specific hs_version and year pairs to create a time series dataset. This will be a more direct product for users and save download and duckdb build time.

Task:

  • Add filtering to KNB filenames before download. Only the time series pairings will be downloaded and ingested into local duckdb.
    # Filter to single hs_version / year pairings
    filter(
      # Use HS96 from 1996-2003 (inclusive)
      ((hs_version == "HS96") & (year <= 2003)) |
        # Use HS02 from 2004-2009 (inclusive)
        ((hs_version == "HS02") & (year >= 2004 & year <= 2009)) |
        # Use HS07 from 2010-2012 (inclusive)
        ((hs_version == "HS07") & (year >= 2010 & year <= 2012)) |
        # Use HS12 from 2013-2020 (inclusive)
        ((hs_version == "HS12") & (year >= 2013 & year <= 2020))
    ) 

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    Status

    🏗 In Progress

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions