Skip to content

Regtools junction extract: strandness detection malfunction #197

@santataRU

Description

@santataRU

Dear all,

I am working with paired-end RNA-seq data and trying to use Regtools (version 1.0.0) to extract splicing junctions. I aligned my reads with HISAT2, using the --rna-strandness RF option, so that the output BAM files include strand information in the "XS" attribute. My HISAT2 command is as follows:

hisat2 -p 10 --rna-strandness RF -x /Index/hg38_index  -1 ./Input_R1_001.fastq.gz -2 ./Input_R2_001.fastq.gz  \ 
| samtools sort -o output.sorted.bam && samtools index output.sorted.bam

To extract junctions, I used this command:

regtools junctions extract -s RF output.sorted.bam

In the 6th column of the output .bed file, I noticed that some junctions are labeled with "+", some with "-", and some with "?". Specifically, the output .bed file contains 606,976 junctions, with 223,254 marked as "-", 231,447 as "+", and 152,275 marked as "?".

This seems unusual since every aligned read includes strand information as either "+" or "-", so I would not expect any "?" marks in the Regtools output.

I would appreciate any insights or suggestions on this issue.

I have attached a small-sized sample (first 1000 reads of a fastq file) of the FASTQ and BAM files to reproduce the error at the following link: https://drive.google.com/drive/folders/1vNdGXk74L8E9SHMaobbnFq2nwF0xe7vq?usp=drive_link

Best regards,
Xiao

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions