Skip to content

nielsend/GenomeAssembly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

APEC 211 & NMEC-O75 Workflow


Assemble Illumina data de novo

  1. Trim reads using trimmomatic

    • Notes: do NOT use a SLURM script on SciNet
    • module load trimmomatic
    • java -jar /software/apps/trimmomatic/64/0.36/trimmomatic-0.36.jar PE AP211_R1_001.fastq.gz AP211_R2_001.fastq.gz AP211_R1_paired.fq.gz AP211_R1_unpaired.fq.gz AP211_R2_paired.fq.gz AP211_R2_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:6 TRAILING:6 SLIDINGWINDOW:4:15 MINLEN:50
    • Learn more: Trimmomatic, Next-Gen Sequence Analysis Tutorial
  2. Examine trimming (FastQC)

    • module load fastqc
    • fastqc AP211_R*_paired.fq.gz
    • Download to view interactive HTML file
    • Examine any categories on the report, if red consider more-strict trimming by modifying step 1.
    • Learn more: FastQC
  3. Assemble data (Spades)

    • module load spades
    • spades.py -1 AP211_R1_paired.fq.gz -2 AP211_R2_paired.fq.gz -o MiSeq_assembled_AP211 --careful --cov-cutoff auto
      • Here we are assembling the paired reads generated by trimmomatic
    • Learn more: Spades
  4. Examine assembly (Quast)

    • module load quast
    • quast.py MiSeq_assembled_AP211/contigs.fasta
    • Download report.pdf to view report found in quast_results/results*
    • Learn more: Quast

Identify Plasmid Replicons

  1. One useful program for identification of antibiotic, virulence, and plasmid replicons is ABRicate.

    module load abricate

    • Plasmid replicons: abricate --db plasmidfinder MiSeq_assembled_AP211/contigs.fasta >> ../abricate_output/plasFinder.csv
  2. Download .csv files and open with Excel or R!

  3. Learn more: ABRicate


Long-read data

  1. Assemble fastq files (Canu):
    • module load canu or module load canu/gcc/64/1.5
    • canu -p AP211 -d AP211_output genomeSize=5.05m -pacbio-raw *.fastq
      • -p: prefix for files
      • -d: directory for output
      • genomeSize: provide an approximate genome size (using Illumina data)--yields a better assembly
      • pacbio-raw: reads are uncorrected PacBio reads
      • AP211_output/AP211.contigs.fasta: Assembly in a FASTA format

Circularization

  • Long-read assemblies and corrected long reads imported into Geneious
  • Reads mapped to assemblies
  • L.R. assemblies were circularized before they were polished with Pilon

Polish long-read data with short reads

  • Load modules:
module load bwa
module load samtools
module load pilon
module load quast
  • Index, Map, and Align:
bwa index AP211.contigs.fasta

bwa mem -t 32 AP211.contigs.fasta ../AP211_Illumina/AP211_R1_paired.fq.gz ../AP211_Illumina/AP211_R2_paired.fq.gz | samtools sort > aln.bam

samtools index aln.bam
samtools faidx AP211.contigs.fasta
samtools sort -O bam -T ./tmp -o aln.bam AP211.aln.sam

bwa mem AP211.contigs.fasta ../AP211_Illumina/AP211_R1_paired.fq.gz ../AP211_Illumina/AP211_R2_paired.fq.gz > AP211.aln.sam
samtools index aln.bam
  • Run Pilon & Quast output:
java -jar /software/apps/pilon/64/1.18/pilon-1.18.jar --genome AP211.contigs.fasta --frags aln.bam --output pilonAP211.fasta --fix bases --changes --verbose

quast.py pilonAP211.fasta.fasta

Learn more: BWA, Samtools, Pilon, Quast,

About

Genome Assembly for APEC 211 and NMEC O75

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published