-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
✋ help wantedExtra attention is neededExtra attention is needed🪄 enhancementNew functionality or feature requestNew functionality or feature request
Milestone
Description
Background:
Quick and easily reproducible custom filtering that assigns a single hs_version to each year. Need to reduce data transfer size from KNB to local storage to duckdb. The next release of ARTIS data (v1.1.0) the consumption tables are substantially larger. In order for our KNB to duckdb solution to work for the internal team and end users, we need to be efficient with file volumes.
For analyses purposes, we often filter ARTIS data to specific hs_version and year pairs to create a time series dataset. This will be a more direct product for users and save download and duckdb build time.
Task:
- Add filtering to KNB filenames before download. Only the time series pairings will be downloaded and ingested into local duckdb.
# Filter to single hs_version / year pairings
filter(
# Use HS96 from 1996-2003 (inclusive)
((hs_version == "HS96") & (year <= 2003)) |
# Use HS02 from 2004-2009 (inclusive)
((hs_version == "HS02") & (year >= 2004 & year <= 2009)) |
# Use HS07 from 2010-2012 (inclusive)
((hs_version == "HS07") & (year >= 2010 & year <= 2012)) |
# Use HS12 from 2013-2020 (inclusive)
((hs_version == "HS12") & (year >= 2013 & year <= 2020))
)
Metadata
Metadata
Assignees
Labels
✋ help wantedExtra attention is neededExtra attention is needed🪄 enhancementNew functionality or feature requestNew functionality or feature request
Type
Projects
Status
🏗 In Progress