-
Notifications
You must be signed in to change notification settings - Fork 6
Description
The current algorithm for SNP ratio cutoffs sometimes gives incorrect results. In the attached image, compare Chr1 (which is correctly analyzed) with Chr2, 3, 4 and 7, which all look to have about the same allele ratio histogram, but the ratio assignments are wrong, and inconsistently so. This was observed even for datasets with high peaks (as the peaks for the heterozygous allele group is low).
Presumably, the current algorithm tries adjusting to small possible changes in allele ratios (the peaks could move a little from their theoretical values), however it's not too helpful, as it strongly depends on the user-provided ploidy, which can often be incorrect.
We currently propose to divide the allele ratio histogram into ploidy+1 equally-spaced bins, and assign ratios to SNPs based on the bins they fall in to. While that would introduce noise, but it seems it will be considerably small, and will not interfere too much in visual assessment of the segment.
This should be coupled with auto-detection of the baseline ploidy (#52) for maximum effect.
@darrenabbey , what do you think? We're currently diving deep into the algorithm that does the ratio assignments based on the ratio histogram, and it just seems too sensitive, even if the ploidy is correctly set. We even saw behavior where a peak of 1:3 would be flanked to the right by 0:4 (as expected), but would also be flanked to the left by 0:4 (which is clearly a bug). So the question is, what scenarios does the current algorithm cover that the suggestion above won't be able to handle?
