Incorrect SE deflation for sparse variants in subset analyses (Step 2 approximation error)

When running Step 2 on a subset of samples (e.g., males only, using `--phenoFile`), we observe a bimodal distribution of Standard Errors (SE).
- **Dense variants** (cluster 1) show correct SE scaling (Ratio ~1.4 vs full sample).
- **Sparse variants** (cluster 2) show deflated SEs (Ratio ~1.0, matching full sample OLS SEs).


The issue appears to be in `src/Step2_Models.cpp` inside `compute_score_qt`.
When `dt_thr->is_sparse` is true, the code uses a fast expansion for the variance term:
`denum_arr(ph) = Gm.squaredNorm() - 2 * XtGm.dot(XtG) + XtG_ss;`

The term `XtG_ss` represents the variance explained by covariates on the **full sample**, not the subset. For subset analyses (where N_subset < N_total), this overestimates the residual variance term, artificially inflating the denominator and deflating the SE.


The correct calculation (projecting covariates onto the mask) is currently commented out in the source code immediately below the approximation. Enabling this path fixes the bimodality in subset analyses.


Run Step 2 on a sparse variant (MAC < 100) using a subset of samples (e.g. 50% of the cohort) and compare the SE to the OLS SE.



- Regenie Version: v4.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect SE deflation for sparse variants in subset analyses (Step 2 approximation error) #678

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Incorrect SE deflation for sparse variants in subset analyses (Step 2 approximation error) #678

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions