Skip to content

racon_wrapper explanation? #84

@erika-r-moore

Description

@erika-r-moore

Hello!

Thank you so much for providing racon as a tool!

I am simply looking for an explanation on what --subsample and --split in racon_wrapper does and if you can help me interpret my results.

To start off, I tried to polish a large genome (~10.8GB) using racon (v. 1.5.0), but came across a lot of memory issues where the job was killed after a few hours/days, even when I gave it 1.5TB of memory. My data is hifi and I have 8 hifi flow cells that I concatenated to make flowcells.fasta.

minimap2 -t26 -x map-hifi hifiasm.fa flowcells.fasta > minimap.paf
racon -m 3 -x -5 -g -4 -w 500 -t 1 -q 20 -u flowcells.fasta minimap.paf hifiasm.fa > racon.fasta

The issues pages suggest trying racon_wrapper to subsample or split the data. I did this to the best of my knowledge:
racon_wrapper -t 8 --subsample 11532328 50 flowcells.fasta minimap.paf hifiasm.fa > racon_wrapper50.fasta

I got the "11532328" from the longest scaffold in hifiasm.fa but I don't have a good reason for the 50 (not quite sure what it is, sequence coverage?)

So, I tried with 100 as well:
racon_wrapper -t 8 --subsample 11532328 100 flowcells.fasta minimap.paf hifiasm.fa > racon_wrapper100.fasta

In doing so, I then ran sequence stats to compare the original assembly to the polished assemblies (hifiasm.fa, racon_wrapper50.fasta, and racon_wrapper100.fasta)

GenomeQC	hifiasm	racon_wrapper50	racon_wrapper100
Number of scaffolds	34623	4539	8850
Total size of scaffolds	 10,930,457,101.00 	 5,127,013,922 	 8,355,708,311 
Longest scaffold 	 11,532,328 	 11,532,722 	 11,532,679 
Shortest scaffold	9368	13657	11790

Racon_wrapper50 has the least number of scaffolds, but has half the total size of scaffolds of the original assembly and of what we would expect from a 10.8GB genome.

With this, I was wondering if you could provide some input into 1) if I used racon_wrapper correctly, 2) why the racon_wrapper50 is half of what is expected, and 3) which result is considered best from these runs?

I tried to read the issues and paper, but I fear they went over my head. Given this, any help is appreciated!!

Best,
Erika

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions