-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Hello!
Thank you so much for providing racon as a tool!
I am simply looking for an explanation on what --subsample and --split in racon_wrapper does and if you can help me interpret my results.
To start off, I tried to polish a large genome (~10.8GB) using racon (v. 1.5.0), but came across a lot of memory issues where the job was killed after a few hours/days, even when I gave it 1.5TB of memory. My data is hifi and I have 8 hifi flow cells that I concatenated to make flowcells.fasta.
minimap2 -t26 -x map-hifi hifiasm.fa flowcells.fasta > minimap.paf
racon -m 3 -x -5 -g -4 -w 500 -t 1 -q 20 -u flowcells.fasta minimap.paf hifiasm.fa > racon.fasta
The issues pages suggest trying racon_wrapper to subsample or split the data. I did this to the best of my knowledge:
racon_wrapper -t 8 --subsample 11532328 50 flowcells.fasta minimap.paf hifiasm.fa > racon_wrapper50.fasta
I got the "11532328" from the longest scaffold in hifiasm.fa but I don't have a good reason for the 50 (not quite sure what it is, sequence coverage?)
So, I tried with 100 as well:
racon_wrapper -t 8 --subsample 11532328 100 flowcells.fasta minimap.paf hifiasm.fa > racon_wrapper100.fasta
In doing so, I then ran sequence stats to compare the original assembly to the polished assemblies (hifiasm.fa, racon_wrapper50.fasta, and racon_wrapper100.fasta)
GenomeQC hifiasm racon_wrapper50 racon_wrapper100
Number of scaffolds 34623 4539 8850
Total size of scaffolds 10,930,457,101.00 5,127,013,922 8,355,708,311
Longest scaffold 11,532,328 11,532,722 11,532,679
Shortest scaffold 9368 13657 11790
Racon_wrapper50 has the least number of scaffolds, but has half the total size of scaffolds of the original assembly and of what we would expect from a 10.8GB genome.
With this, I was wondering if you could provide some input into 1) if I used racon_wrapper correctly, 2) why the racon_wrapper50 is half of what is expected, and 3) which result is considered best from these runs?
I tried to read the issues and paper, but I fear they went over my head. Given this, any help is appreciated!!
Best,
Erika