Skip to content

LauYuXuan/FASTQ_processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

SRA to FASTQ Conversion Script

This Bash script automates the process of converting SRA (Sequence Read Archive) files to FASTQ format using the fasterq-dump tool from the SRA Toolkit. It processes multiple SRA files based on a provided list of SRR IDs.

Features

  • Processes multiple SRA files in batch based on a list of SRR IDs
  • Checks for existing FASTQ files to avoid redundant processing
  • Creates separate output directories for each SRA accession
  • Uses fasterq-dump for efficient SRA to FASTQ conversion
  • Handles split-files output for paired-end sequencing data
  • Provides detailed error messages for troubleshooting

Prerequisites

  • Bash shell
  • SRA Toolkit installed with fasterq-dump accessible in the system PATH

Usage

./script_name.sh <srr_list_file> <sra_file_path> <fastq_output_path>
<srr_list_file>: A text file containing SRR IDs, one per line
<sra_file_path>: Directory containing SRA files organized in subdirectories
<fastq_output_path>: Directory where FASTQ files will be saved
Input File Format
The srr_list_file should be a plain text file with one SRR ID per line, for example:

Copy
SRR1234567
SRR2345678
SRR3456789
Directory Structure
Input SRA directory (sra_file_path) should have this structure:

sra_file_path/
├── SRR1234567/
│   └── SRR1234567.sra
├── SRR2345678/
│   └── SRR2345678.sra
└── ...
Output directory (fastq_output_path) will be structured as:


fastq_output_path/
├── SRR1234567/
│   ├── SRR1234567_1.fastq
│   └── SRR1234567_2.fastq (if paired-end)
├── SRR2345678/
│   ├── SRR2345678_1.fastq
│   └── SRR2345678_2.fastq (if paired-end)
└── ...

How it works
The script checks for the correct number of input arguments.
It reads the SRR list file line by line.
For each SRR ID:
It checks if the output directory is empty to avoid redundant processing.
It finds the corresponding SRA directory and file.
If the SRA file exists, it runs fasterq-dump to convert it to FASTQ format.
The resulting FASTQ files are saved in the corresponding output subdirectory.
The script continues until all SRR IDs in the list are processed.
Error Handling
The script checks for the correct number of arguments.
It verifies if the SRA directory and file exist for each SRR ID.
It provides informative messages for missing directories or files.
It skips processing if FASTQ files already exist for an SRR ID.
Note
Ensure that your SRA files are organized in subdirectories named after their SRR IDs within the sra_file_path directory.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages