Skip to content

General Health#169

Open
caitlink12 wants to merge 4 commits intov3.0.0-validation-infrastructurefrom
gen-health
Open

General Health#169
caitlink12 wants to merge 4 commits intov3.0.0-validation-infrastructurefrom
gen-health

Conversation

@caitlink12
Copy link

Included ICES survey cycles to existing general health variables GEN_07 and GEN_10 to variable and variable_details.

included ICES survey cycles into existing GEN_07 variables and updated suffixes from _i to _m
included ICES survey cycle into existing GEN_10 variable and updated suffixes from _i to _m
@DougManuel
Copy link
Contributor

Code review

Reviewed 2 variables (GEN_07, GEN_10) for PUMF and Master across 2001–2017-2018.

L6 integration test: cross-cycle prevalence

rec_with_table() ran successfully for all 9 PUMF cycles. Since this PR only changes master (_m) mappings, PUMF results confirm the full-file rewrite didn't break existing functionality. Both variables show 100% valid across all cycles with stable category distributions — no era boundary issues.

Cycle GEN_07 valid % GEN_10 valid %
cchs2001_p 100% 100%
cchs2003_p 100% 100%
cchs2005_p 100% 100%
cchs2007_2008_p 100% 100%
cchs2009_2010_p 100% 100%
cchs2011_2012_p 100% 100%
cchs2013_2014_p 100% 100%
cchs2015_2016_p 100% 100%
cchs2017_2018_p 100% 100%

Master (_m) mappings validated by worksheet checks only — no runtime data available for L6 testing.

Issues found

1 issue (P0)

  1. GEN_10 wrong master source variable names (P0, L1 concordance / Check 3, confidence: 100)

    All 7 GEN_10 rows map pre-2007 master databases to the wrong CCHS source variable:

    • Current: cchs2001_m::GENA_01, cchs2003_m::GENC_01, cchs2005_m::GENE_01
    • Correct: cchs2001_m::GENA_10, cchs2003_m::GENC_10, cchs2005_m::GENE_10

    GENA_01 = "Self-perceived health" (this is GEN_07's source variable)
    GENA_10 = "Sense of belonging to local community" (the correct source for GEN_10)

    Confirmed against cchsflow-docs data dictionary. This bug was pre-existing on the target branch (with _i suffix) — the PR converted _i_m without correcting the source name. At runtime, GEN_10 would return self-perceived health data instead of sense-of-belonging data for 2001, 2003, and 2005 master cycles.

    Fix: Replace GENA_01GENA_10, GENC_01GENC_10, GENE_01GENE_10 in all 7 GEN_10 rows' variableStart field.

Pre-existing issues (not introduced by this PR)

  • Space after :: in GEN_07: cchs2017_2018_p:: GEN_020 (all 8 rows)
  • _s/_m mismatch: variable_details.csv uses cchs2009_s/cchs2010_s/cchs2012_s, variables.csv uses _m
  • dummyVariable colon pattern: GEN_07_cat5_NA::a instead of _NAa

CEP: ceps/cep-010-gen-health/

Generated with Claude Code

@DougManuel
Copy link
Contributor

Additional review finding: _s_m conversion needed

Both GEN_07 and GEN_10 still have deprecated _s (share file) databases in variable_details.csv that should be converted to _m:

  • cchs2009_scchs2009_m
  • cchs2010_scchs2010_m
  • cchs2012_scchs2012_m

This affects all 15 rows (8 for GEN_07, 7 for GEN_10). variables.csv already uses _m for these databases, so the conversion also resolves the databaseStart consistency mismatch between the two files.

Since the PR is already converting _i_m in these rows, this is a natural cleanup to include.

Summary of recommended fixes for this PR

  1. P0: Fix GEN_10 source variable names: GENA_01GENA_10, GENC_01GENC_10, GENE_01GENE_10 (all 7 rows)
  2. P2: Convert _s_m for cchs2009_s, cchs2010_s, cchs2012_s in both GEN_07 and GEN_10 (all 15 rows)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants