Skip to content

How to trim after, instead of before, extraction #40

@clintval

Description

@clintval

I am trying to extract UMIs from the beginning of paired reads and put those UMI sequences into the RX SAM tag as is convention.

This works, but the UMI sequence remains in the reads afterwards:

$ splitcode \
    --nFastqs 2 \
    --select '0,1' \
    --extract '0:0<umi1[2]>,1:0<umi2[2]>' \
    --x-names \
    --out-bam \
    --pipe \
    r1.fastq.gz \
    r2.fastq.gz \
    | samtools view | head -n2
A01000:460:XXXXXX:1:2101:10004:10081	77	*	0	0	*	*     	0	0	ACGTAA	,:FF	RX:Z:AC-GG
A01000:460:XXXXXX:1:2101:10004:10081	141	*	0	0	*	*     	0	0	GGTCAA	,:ff	RX:Z:AC-GG

If I try to trim the UMI sequence away, trimming occurs before extraction.

$ splitcode \
    --nFastqs 2 \
    --select '0,1' \
    --extract '0:0<umi1[2]>,1:0<umi2[2]>' \
    --trim-5 '2,2' \
    --x-names \
    --out-bam \
    --pipe \
    r1.fastq.gz \
    r2.fastq.gz \
    | samtools view | head -n2
A01000:460:XXXXXX:1:2101:10004:10081	77	*	0	0	*	*     	0	0	GTAA	,:FF	RX:Z:GT-TC
A01000:460:XXXXXX:1:2101:10004:10081	141	*	0	0	*	*     	0	0	TCAA	,:ff	RX:Z:GT-TC

Could there be a way to trim after extraction?

I notice that the quality trimming operations can be optionally triggered after extraction.

Or am I thinking about how to use splitcode incorrectly?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions