Skip to content

PLOTMETRICSQC #336

@FerriolCalvet

Description

@FerriolCalvet

This issue aims to address the need for defining QCs for many of the metrics computed in deepCSA.
Below we list the different metrics and proposals on how to assess the quality/stability of the measurements.

  • Cohort-level

    • Gene-cohort background mutation density
      • Synonymous
        • Check the distribution of mutation densities per gene
        • Define a Z-score for each of the genes
        • Identify outliers
      • Non protein affecting
        • Check the distribution of mutation densities per gene
        • Define a Z-score for each of the genes
        • Identify outliers
      • Compare agreement between synonymous and non-protein affecting mutation rate
        • Maybe compute all the differences and then compute a Z-score with the distribution of differences between Syn and NPA.
        • With this new distribution define the outliers to see cases that clearly disagree.
    • Stability of all gene-cohort mutation densities (including both protein affecting and background consequences)
      Make a small alteration to the numerator (+1 mutation or whichever amount we decide) and assess the impact of this change. Take into account that for protein-affecting consequence types there could be meaningful outliers.
      • Use the distributions of values defined with the observed mutation densities per gene and is still within the previously defined alterations
      • Compute a metric such as relative/absolute difference between initial observation and the tuned one
  • Sample-level
    Follow the same principles as for the gene-level mutation rates, but taking into account that each sample may have a different nature so the amount of disagreement here might be bigger and have underlying biological reasons.

    • Compute per background mutation densities per sample and compare them between each other to identify outliers either biological (hypermutators, germline deficiencies) or due to artifacts of the variant calling or numerical artifacts of undersampling/...
  • Sample-gene-level
    This is the trickiest and more likely to be unstable mutation density.
    The focus here is not on the background mutation density since we assume the values will be too sparse causing the metric to be unstable, and this can be solved with the globalloc approach in case that the cohort-level mutation density is stable enough for that gene. Despite making this assumption it would be good to check if:

    • Omega globalloc QC #368
      Is the number of mutations inferred by omega globalloc preprocessing in line with the observed number of mutations. It will likely not be the same because of the sparsity but they should be similar enough.

    Our goal is to see if the mutation density per consequence type (mainly missense and truncating) is stable enough for us to trust the resulting omega values we compute.

    • Apply the same reasoning as the one used for measuring the stability of the all gene-cohort mutation densities, but to the sample specific level.
    • Take into account that for protein-affecting consequence types there could be meaningful outliers.
  • Mutational profile stability #364

    • Measure sparsity of the mutational profile to inform its use for downstream analysis

Sub-issues

Metadata

Metadata

Labels

new-featureNew functionality being added.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions