-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
When generating a matrix of features for RIVER, how do the developers handle situations where no variant near a particular gene has a CADD annotation for features like TFBS or EncOCCombPVal? glmnet cannot handle NAs, but n my dataset 95% of genes have at least one missing feature annotation, so removing such cases would waste most of the data.
Ex:
| cHmmTx | cHmmTssBiv | cHmmHet | cHmmBivFlnk | cHmmTxFlnk | TFBS | EncOCCombPVal | |
|---|---|---|---|---|---|---|---|
| GTEX-111YS:ENSG00000007923 | 0.016 | 0 | 0 | 0 | 0.000 | NA | NA |
| GTEX-117YW:ENSG00000007923 | 0.000 | 0 | 0 | 0 | 0.000 | NA | NA |
| GTEX-1192X:ENSG00000007923 | 0.000 | 0 | 0 | 0 | 0.000 | NA | NA |
| GTEX-11EM3:ENSG00000007923 | 0.000 | 0 | 0 | 0 | 0.008 | NA | NA |
| GTEX-11EQ8:ENSG00000007923 | 0.000 | 0 | 0 | 0 | 0.000 | NA | NA |
| GTEX-11EQ9:ENSG00000007923 | 0.016 | 0 | 0 | 0 | 0.000 | NA | NA |
Metadata
Metadata
Assignees
Labels
No labels