-
Notifications
You must be signed in to change notification settings - Fork 63
Description
When running Step 2 on a subset of samples (e.g., males only, using --phenoFile), we observe a bimodal distribution of Standard Errors (SE).
- Dense variants (cluster 1) show correct SE scaling (Ratio ~1.4 vs full sample).
- Sparse variants (cluster 2) show deflated SEs (Ratio ~1.0, matching full sample OLS SEs).
The issue appears to be in src/Step2_Models.cpp inside compute_score_qt.
When dt_thr->is_sparse is true, the code uses a fast expansion for the variance term:
denum_arr(ph) = Gm.squaredNorm() - 2 * XtGm.dot(XtG) + XtG_ss;
The term XtG_ss represents the variance explained by covariates on the full sample, not the subset. For subset analyses (where N_subset < N_total), this overestimates the residual variance term, artificially inflating the denominator and deflating the SE.
The correct calculation (projecting covariates onto the mask) is currently commented out in the source code immediately below the approximation. Enabling this path fixes the bimodality in subset analyses.
Run Step 2 on a sparse variant (MAC < 100) using a subset of samples (e.g. 50% of the cohort) and compare the SE to the OLS SE.
- Regenie Version: v4.1