Skip to content

V3 pct time canada#165

Merged
rafdoodle merged 8 commits intov3from
v3-pct-time-canada
Feb 13, 2026
Merged

V3 pct time canada#165
rafdoodle merged 8 commits intov3from
v3-pct-time-canada

Conversation

@rafdoodle
Copy link
Collaborator

@rafdoodle rafdoodle commented Feb 7, 2026

Hello Doug and Caitlin,

Here is a new v3 PR for percent time in Canada, following up form PR #149.

In this PR, I added the following new variables:

  • SDCGCB for 2001-2018 master (_m) data and 2009-2012 single-year shared (_s) data (replacing SDCGCBG_A)
  • SDCDRES for 2001-2018 master (_m) data and 2009-2012 single-year shared (_s) data (replacing SDCGRES_A)

I also created a new function in R/percent-time-canada.R called pct_time_fun_A, which takes in the two above variables and DHH_AGE to calculate % time in Canada in the Master data.

  • The key difference is that the new _A function simply does residency/age * 100 with the Master's continuous SDCDRES, unlike residency midpoint/age * 100 when using the PUMF's categorical SDCGRES.
  • Tests were created for this new function in tests/testthat/test-immigarion.R.
  • DHH_AGE's databaseStart and variableStart had to be updated to only _m and _s data in the worksheets for rec_with_table() to work which it did.

Please let me know what you think.

Thanks,
Rafidul

@DougManuel
Copy link
Contributor

Code review

Reviewed 7 variables (DHH_AGE, SDCGCB, SDCGCBG, SDCDRES, SDCGRES, pct_time_der, pct_time_der_A) for PUMF and Master across 2001-2018.

L6 integration test: cross-cycle prevalence (PUMF)

rec_with_table() ran successfully for all 9 PUMF cycles. No step changes at era boundaries.

Cycle pct_time_der valid %
cchs2001_p 100%
cchs2003_p 96.5%
cchs2005_p 93.5%
cchs2007_2008_p 100%
cchs2009_2010_p 98%
cchs2011_2012_p 95%
cchs2013_2014_p 96.5%
cchs2015_2016_p 95.5%
cchs2017_2018_p 98%

Master (_m) mappings validated by worksheet checks only — no PUMF data available for L6 testing of pct_time_der_A.

Issues found

6 issues (4 PR-introduced, 2 pre-existing/informational):

  1. DHH_AGE databaseStart typo (P0): cchs2007_2008m missing underscore — should be cchs2007_2008_m. Affects variables.csv and all variable_details.csv rows.

https://github.com/Big-Life-Lab/cchsflow/blob/5b8954748947ddfd33269a4b6c48c6821c894d1d/inst/extdata/variables.csv

  1. pct_time_der_A databaseStart double comma (P1): Contains , , (empty entry) between cchs2017_2018_m and cchs2009_s.

https://github.com/Big-Life-Lab/cchsflow/blob/5b8954748947ddfd33269a4b6c48c6821c894d1d/inst/extdata/variables.csv

  1. DHH_AGE missing explicit 2015+ master mappings (P1): [DHH_AGE] default applies to cchs2015_2016_m and cchs2017_2018_m without explicit db::VAR mappings. Other SDC variables in this PR correctly have explicit 2015+ mappings (SDCDVCB, SDCDVRES, etc.). Needs DDI verification — DHH_AGE may not have been renamed, in which case the default is correct.

  2. SDCGCB dummyVariable colon notation (P2): Uses _NA::a/_NA::b instead of _NAa/_NAb. Affects 6 rows in variable_details.csv. Pre-existing pattern in SDCGCBG and SDCGRES (on v3 branch) — fixing SDCGCB keeps new variables consistent with the updated naming convention.

  3. Missing unit tests for born-in-Canada case (P1): Neither pct_time_fun() nor pct_time_fun_A() has a test for SDCGCBG=1 / SDCGCB=1 (returns 100). Also missing: pct_time_fun_cat(100) boundary test and pct_time_fun_cat(tagged_na("a")).

https://github.com/Big-Life-Lab/cchsflow/blob/5b8954748947ddfd33269a4b6c48c6821c894d1d/tests/testthat/test-immigration.R

  1. Missing man/pct_time_fun_A.Rd (P2): Roxygen docs exist in source but .Rd file not generated. Run devtools::document().

#' @title Percent time in Canada (Master files)
#'
#' @description This function creates a derived variable (pct_time_der_A) that
#' provides an estimated percentage of the time a person's life was spent in
#' Canada. This variant is for Master file data where SDCDRES is continuous
#' (actual years) rather than categorical.
#'
#' @param DHH_AGE continuous age variable.
#'
#' @param SDCGCB whether or not someone was born in Canada (1 - born in Canada,
#' 2 - born outside Canada)
#'
#' @param SDCDRES continuous variable for how long someone has lived in Canada
#' (in years).
#'
#' @return Numeric value between 0 and 100 that represents
#' percentage of a respondent's time in Canada
#'
#' @examples
#' # Using pct_time_fun_A() to generate a value for percent time spent in Canada
#' # with user inputted values. Let's say you are 27 years old who was born
#' # outside of Canada and have been living in Canada for 10 years.
#' # Your estimated percent time spent in Canada can be calculated as follows:
#'
#' library(cchsflow)
#' pct_time <- pct_time_fun_A(DHH_AGE = 27, SDCGCB = 2, SDCDRES = 10)
#'
#' print(pct_time)
#'
#' @export
pct_time_fun_A <-
function(DHH_AGE, SDCGCB, SDCDRES) {
if (is_equal(SDCGCB, 1)) {
return(100)
}
DHH_AGE <- if_else2(DHH_AGE > 0, DHH_AGE,
return(tagged_na("b")))
SDCDRES <- if_else2(SDCDRES >= 0, SDCDRES, return(tagged_na("b")))
if_else2(SDCGCB == 2, (SDCDRES / DHH_AGE * 100), tagged_na("b"))
}

Checked: era boundary defaults, databaseStart consistency, PUMF/Master naming, pre-2007 cycle letters, known error patterns, DV specifications, unit tests, and PUMF integration.

CEP: ceps/cep-008-immigration/

Generated with Claude Code

@rafdoodle
Copy link
Collaborator Author

rafdoodle commented Feb 12, 2026

DHH_AGE has maintained the same name since 2007-2008. Isn't it supposed to be _NA::a/_NA::b though? It's like that everywhere.

rafdoodle and others added 3 commits February 12, 2026 11:29
…put bounds validation

Replaced pct_time_fun() and pct_time_fun_A() with calculate_pct_time(age,
born_in_canada, years_in_canada) using case_when for vectorised logic.
Renamed pct_time_fun_cat() to categorize_pct_time(). Moved PUMF midpoints
from R code to worksheet via new SDCGRES_cont intermediate variable.
Merged pct_time_der and pct_time_der_A into single pct_time_der with
separate PUMF and master row sets. Added [0,100] output bounds validation
with out-of-range values returning tagged_na("b"). 28 tests passing.

CEP-008 review artifacts included.
@DougManuel
Copy link
Contributor

Code review — update after refactoring (2c4c9d1)

Reviewed 5 in-scope variables (SDCGRES_cont, pct_time_der, pct_time_der_cat10, SDCDRES, SDCGCB) for PUMF and Master across 2001-2018. Full review details in ceps/cep-008-immigration/PR-165-review-summary.md.

Summary of changes since last review

All 6 issues from the original review are resolved:

# Issue Resolution
1 DHH_AGE cchs2007_2008m typo (P0) Fixed in 6260d65
2 pct_time_der_A double comma (P1) Eliminated — pct_time_der_A merged into pct_time_der
3 DHH_AGE missing 2015+ mappings (P1→informational) DDI verification confirms DHH_AGE unchanged through 2021; default is safe
4 SDCGCB dummyVariable colon notation (P2) Pre-existing on v3 branch for SDCGCBG/SDCGRES; out of scope
5 Missing unit tests (P1) 28 tests now covering born-in-Canada, boundaries, vector inputs, output bounds
6 Missing pct_time_fun_A.Rd (P2) Function removed — unified into calculate_pct_time()

Key refactoring in latest commit

  • Unified DV function: pct_time_fun() and pct_time_fun_A() replaced by calculate_pct_time(age, born_in_canada, years_in_canada). pct_time_fun_cat() renamed to categorize_pct_time(). Follows v3 verb-first naming.
  • Merged pct_time_der_A into pct_time_der: Separate PUMF and master row sets under one variable name. _s references purged.
  • New SDCGRES_cont: PUMF categorical → continuous midpoint conversion moved from R code to worksheet. calculate_pct_time() receives continuous years for both PUMF and master.
  • Output bounds validation: Values outside [0, 100] → tagged_na("b"). Documented in variable_details notes.
  • case_when() replaces if_else2(): Vectorized logic, no early return().

L6 integration and tests

L6 PUMF integration (rec_with_table() across 9 cycles) ran successfully — no step changes at era boundaries, stable valid % (93.5-100%). See CEP docs for cross-cycle tables.

28/28 unit tests pass. Master mappings validated by worksheet checks only (no PUMF runtime data for _m).

Issues found in this review pass

No new issues. All L3-L5 checks pass (era boundaries, databaseStart consistency, naming conventions, error patterns, DV spec). Pre-existing issues (18 check_worksheet errors in variable_details.csv column ordering and row sorting) confirmed on v3 target branch — not introduced by this PR.

Checked: era boundary defaults, databaseStart consistency, PUMF/Master naming, pre-2007 cycle letters, known error patterns, DV specification, unit tests, PUMF integration, output bounds validation.

CEP: ceps/cep-008-immigration/

@rafdoodle rafdoodle merged commit 6ee9336 into v3 Feb 13, 2026
1 check passed
@rafdoodle rafdoodle deleted the v3-pct-time-canada branch February 13, 2026 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants