RNAseq_Analysis

Referenced based RNA-Sequencing: Aim:

The profiling of gene expression.
To analyze the change in cellular transcriptome.
To reveal the presence and quantity of RNA in sample.
Quantifying the differential gene expression.

At least you require two samples:

a. Control: normal, healthy, not diseased b. Treated: abnormal, unhealthy, diseased

Biological analysis plan: Control vs treated : check the expression of control with respect to treated Treated vs control: check the expression of treated with respect to control Control1 vs control2 Treated1 vs treated2

Three common steps in ngs: • Library preparation • Sequencing • Data analysis

Tools installation By repositories method

#Fastqc: check the quality of data

sudo apt-get install fastqc 
fastqc -help

#Cutadapt tool- trimming the adapters

sudo apt-get install cutadapt
cutadapt -help

#Hisat2 – mapping tool

sudo apt-get install hisat2
hisat2

#Samtools- convert sam to bam file or bam to sam

sudo apt-get install samtools
samtools

By source code method : Cufflinks Download cufflinks: http://cole-trapnell-lab.github.io/cufflinks/assets/downloads/cufflinks-2.2.1.Linux_x86_64.tar.gz

Referenced based rna sequencing: reference genome is available in database for our sample with gtf file. Replicates: to check the variability in the sample. Two of replicates: • Technical replicate: it means same experiment we are performing on sample again and again • Biological replicate: we are taking different biological sample within same population

Depth: How many times the base is read by the sequencing machine.
Coverage: How many reads are able to cover the reference genome.
Phred quality score: It is assigned to each and every base which is sequenced by the machine. Every base which is sequenced by the machine has some quality score which tell us about the accuracy that how much accurately base was called by the machine (Cutoff >=30).
Bam and sam file: Containing complete mapping information
Bam : Binary alignment map- computer readable file
Sam: sequence alignment file: Human readable file ATGACGTGCGA
Convert bam to sam or sam to bam – samtools

##Data selection

https://www.ncbi.nlm.nih.gov/sra/ERX11327741[accn]

##1. Data download

#control1
wget -c ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR119/058/ERR11943458/ERR11943458.fastq.gz

#control2
wget -c ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR119/059/ERR11943459/ERR11943459.fastq.gz

#Treated1
wget -c ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR119/052/ERR11943452/ERR11943452.fastq.gz

#Treated2
wget -c ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR119/053/ERR11943453/ERR11943453.fastq.gz

Trimming for Quality

##cut the first 50 lakh reads

head -5000000 ERR11943458.fastq > control1_.fastq 
head -5000000 ERR11943459.fastq > control2_.fastq 
head -5000000 ERR11943452.fastq > Treated1_.fastq  
head -5000000 ERR11943453.fastq > Treated2_.fastq

##2. Quality control
fastqc *.fastq

##3. Trimming and cleaning of data by using cutadapt

#for single end
cutadapt -b(seq) -q 30.30 -m 20 -o trim_control.fastq control1_.fastq

#for paired end 
cutadapt -b(seq) -B(seq) -q 30,30 -m 20 -o trim_control1_1.fastq -p trim_control1_2.fastq control1_1.fastq control1_2.fastq

#Download the chr16 from UCSC and annotation files
wget -c https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr16.fa.gz

gunzip chr16.fa.gz

##4. Mapping

#1. Creation of indexed genome
hisat2-build chr16.fa chr16

#2. Mapping of reads to indexed genome

#single end data

hisat2 --dta-cufflinks -p 2 -x chr16 control1.fastq > control1.bam

hisat2 --dta-cufflinks -p 2 -x chr16 control2.fastq > control2.bam

hisat2 --dta-cufflinks -p 2 -x chr16 Treated1.fastq > Treated1.bam

hisat2 --dta-cufflinks -p 2 -x chr16 Treated2.fastq > Treated2.bam

#Paired end data

hisat2 --dta-cufflinks -p 2 -x chr16 -1 sample_1.fastq -2 sample_2.fastq > mapped.bam

#3. Conversion of BAM into sorted BAM using samtools

Installation: sudo apt-get install samtools

samtools sort control1.bam > sorted_control1.bam
samtools sort control2.bam > sorted_control2.bam
samtools sort Treated1.bam > sorted_Treated1.bam
samtools sort Treated2.bam > sorted_Treated2.bam

##4: to normalise the data and count no. of reads

cufflinks -o cufflinks_control1 -p 2 -G chr16.gtf sorted_control1.bam

cufflinks -o cufflinks_control2 -p 2 -G chr16.gtf sorted_control2.bam

cufflinks -o cufflinks_Treated1 -p 2 -G chr16.gtf sorted_Treated1.bam

cufflinks -o cufflinks_Treated2 -p 2 -G chr16.gtf sorted_Treated2.bam

##6: Merging
cuffmerge -o cuffmerge_output -p 2 -g chr16.gtf -s chr16.fa assembly.txt

##7: Differential expression using cuffdiff #plan

#Treated vs control
cuffdiff -p 2 -o cuffdiff_result -L control,Treated -u /mnt/f/Dr_OmicsLab/NGS/RNAseq_Human/cuffmerge_output/merged.gtf sorted_control1.bam,sorted_control2.bam sorted_Treated1.bam,sorted_Treated2.bam

#Steps for post analysis Load gene expression file from cuffdiff output to excel and filter out the following.

Sorting of significant genes: filter pvalue <0.05, log fold - up = >=1, down= <=-1 Do conditional formatting. And also separate gene name with comma, and add them on original sheet.
Go to uniprot - select ID mapping and paste all the short listed genes there. customize the column - then add the columns - in gene ontology select all and from external resources select genome annotation - select gene id, Kegg etc.. and download it in excel format.
Gene annotation: Load the downloaded file and name it gene annotation, Remove the duplicates and copy and paste it in the gene expression file with significant genes and compare the IDs with gene and from NM IDs.
Pathway enrichment: functional enrichment: Go to ShinyGo website and paste the gene names there, and get min 10 to max 5000 pathways. Check with all parameters kegg, gene ontology etc.. and you will get some plots etc check and study each accordingly.
Network analysis: to go the STRING database - now add the separate excel sheet which have only log(fold Change) and gene name. And now copy the genes and paste them in string database. And download the nodes in tsv file. And Now download Cytoscape and upload this file and visualize the data.
R plot : volcano plot, heatmap plot

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
rnaseq_pipeline.sh		rnaseq_pipeline.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RNAseq_Analysis

About

Uh oh!

Releases

Packages

Languages

iamrutzk/RNAseq_Analysis

Folders and files

Latest commit

History

Repository files navigation

RNAseq_Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages