You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
We are seeing many normal individuals with far larger sizes than expected. The issue appears to be that the current locus definition includes flanking sequence (corresponding to the hg38 masked region) that has GCC trinucleotides. These are then counted towards pathogenic. If there are no other concerns here, adjusting the limits to the primary repeat should solve the issue. For reference, see e.g. Silva et al 2021, specifically figure S1, where some of the spurious surrounding GCCs are evident. Note that the common allele (15 copies) corresponds to the reference genome allele for hg19 and hg38, and that the size of the pathologic repeat should be 45 nt, whereas T2T CHM13 ts1 has 16 copies.
Sorry this took a while to get addressed! Where we're at right now: evaluating these new coordinates, making sure we have the best candidate possible. But overall, we do agree evidence points to a smaller range being more accurate (evaluating chrX:148500638-148500684 at the moment). Once we've decided on what to do with that, we'll be able to better evaluate the pathogenic range. I am looking for published research to back these range changes as well; @dnil if you have something that I didn't find please feel free to post it here. Following is what I have found/you have shared: Silvia et al. 2021, Murray et al. 1996, and Clark et al. 2020.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Name
Daniel Nilsson
Username
@dnil
Email
daniel.nilsson@ki.se
Description
We are seeing many normal individuals with far larger sizes than expected. The issue appears to be that the current locus definition includes flanking sequence (corresponding to the hg38 masked region) that has GCC trinucleotides. These are then counted towards pathogenic. If there are no other concerns here, adjusting the limits to the primary repeat should solve the issue. For reference, see e.g. Silva et al 2021, specifically figure S1, where some of the spurious surrounding GCCs are evident. Note that the common allele (15 copies) corresponds to the reference genome allele for hg19 and hg38, and that the size of the pathologic repeat should be 45 nt, whereas T2T CHM13 ts1 has 16 copies.