-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Dear all,
I am working with paired-end RNA-seq data and trying to use Regtools (version 1.0.0) to extract splicing junctions. I aligned my reads with HISAT2, using the --rna-strandness RF option, so that the output BAM files include strand information in the "XS" attribute. My HISAT2 command is as follows:
hisat2 -p 10 --rna-strandness RF -x /Index/hg38_index -1 ./Input_R1_001.fastq.gz -2 ./Input_R2_001.fastq.gz \
| samtools sort -o output.sorted.bam && samtools index output.sorted.bam
To extract junctions, I used this command:
regtools junctions extract -s RF output.sorted.bam
In the 6th column of the output .bed file, I noticed that some junctions are labeled with "+", some with "-", and some with "?". Specifically, the output .bed file contains 606,976 junctions, with 223,254 marked as "-", 231,447 as "+", and 152,275 marked as "?".
This seems unusual since every aligned read includes strand information as either "+" or "-", so I would not expect any "?" marks in the Regtools output.
I would appreciate any insights or suggestions on this issue.
I have attached a small-sized sample (first 1000 reads of a fastq file) of the FASTQ and BAM files to reproduce the error at the following link: https://drive.google.com/drive/folders/1vNdGXk74L8E9SHMaobbnFq2nwF0xe7vq?usp=drive_link
Best regards,
Xiao