Skip to content

Duplicates were removed but log says 0 dups removed #55

@sehawk

Description

@sehawk

Hi @GregoryFaust

We are executing samblaster 0.1.26 using below command

samblaster_0.1.26.simg samblaster --ignoreUnmated  -a -r -i input.sam -o output.sam 

samblaster log

samblaster: Version 0.1.26
samblaster: Opening input.sam for read.
samblaster: Opening output.sam for write.
samblaster: Loaded 25 header sequence entries.
samblaster: Found        62867 of    2402373 (2.617%) total read ids are marked paired yet are unmated.
samblaster: Please double check that input file is read-id (QNAME) grouped.
samblaster: Found          838 of    2402373 (0.035%) total read ids with no primary alignment.
samblaster: Please double check that input file is read-id (QNAME) grouped.
samblaster: Removed          0 of    2402373 (0.000%) total read ids as duplicates using 18128k memory in 1.911S CPU seconds and 3S wall time.

Flagstats of input sam

4744272 + 0 in total (QC-passed reads + QC-failed reads)
4423 + 0 secondary
0 + 0 supplementary
2178640 + 0 duplicates
4743484 + 0 mapped (99.98% : N/A)
4739849 + 0 paired in sequencing
2372230 + 0 read1
2367619 + 0 read2
4724503 + 0 properly paired (99.68% : N/A)
4737919 + 0 with itself and mate mapped
1142 + 0 singletons (0.02% : N/A)
3131 + 0 with mate mapped to a different chr
3064 + 0 with mate mapped to a different chr (mapQ>=5)

Flagstats of output sam post executing samblaster

2565632 + 0 in total (QC-passed reads + QC-failed reads)
2740 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
2564846 + 0 mapped (99.97% : N/A)
2562892 + 0 paired in sequencing
1283161 + 0 read1
1279731 + 0 read2
2551398 + 0 properly paired (99.55% : N/A)
2560969 + 0 with itself and mate mapped
1137 + 0 singletons (0.04% : N/A)
2585 + 0 with mate mapped to a different chr
2525 + 0 with mate mapped to a different chr (mapQ>=5)

Questions

  1. Why does samblaster show this warning ? samblaster: Please double check that input file is read-id (QNAME) grouped. . Our input sam is already QNAME sorted. We verified that by checking SAM header SO (sort order)
  2. Why does samblaster say that 0 duplicates removed while we can see that duplicates were indeed removed as shown by samtools flagstat
samblaster: Removed          0 of    2402373 (0.000%) total read ids as duplicates using 18128k memory in 1.911S CPU seconds and 3S wall time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions