handle invalid chromosomes in samtools chunking #378
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Came across a situation when running maf2maf with a MAF that had the mitochondrial chromosome as "M" but the GrCh38 reference as "MT". The samtools chunking in
maf2vcf.plwill process randomized chunks up until it hits an entry with this mismatch, at which point it skips the rest of the chunk. This results in non-deterministic outputs to the *skipped file so was a little difficult to debug why certain variants were being skipped at a glance.When there are no reference/MAF contig mismatches the script works as expected (most cases) but otherwise it presents with inconsistent variant skipping.
Summary
This PR fixes a bug in
maf2vcf.plwhen run with mismatching contigs:1: Non-deterministic variant skipping
Root Cause:
Perl's
keys %hashreturns keys in random order. This caused:Impact: Users running the same MAF multiple times would get different results, making debugging difficult.
2: Invalid chromosomes cause cascade failures
Root Cause:
When
samtools faidxencounters an invalid chromosome (e.g., "M" when reference uses "MT"), it:Example:
This PR ensures:
.skipped.tsvfor user awarenessBackward Compatibility
✅ Fully backward compatible:
.skipped.tsvfile formatRelated Issues
I believe this fix addresses the root cause of issue #234.
Checklist