This Bash script automates the process of converting SRA (Sequence Read Archive) files to FASTQ format using the fasterq-dump tool from the SRA Toolkit. It processes multiple SRA files based on a provided list of SRR IDs.
- Processes multiple SRA files in batch based on a list of SRR IDs
- Checks for existing FASTQ files to avoid redundant processing
- Creates separate output directories for each SRA accession
- Uses
fasterq-dumpfor efficient SRA to FASTQ conversion - Handles split-files output for paired-end sequencing data
- Provides detailed error messages for troubleshooting
- Bash shell
- SRA Toolkit installed with
fasterq-dumpaccessible in the system PATH
./script_name.sh <srr_list_file> <sra_file_path> <fastq_output_path>
<srr_list_file>: A text file containing SRR IDs, one per line
<sra_file_path>: Directory containing SRA files organized in subdirectories
<fastq_output_path>: Directory where FASTQ files will be saved
Input File Format
The srr_list_file should be a plain text file with one SRR ID per line, for example:
Copy
SRR1234567
SRR2345678
SRR3456789
Directory Structure
Input SRA directory (sra_file_path) should have this structure:
sra_file_path/
├── SRR1234567/
│ └── SRR1234567.sra
├── SRR2345678/
│ └── SRR2345678.sra
└── ...
Output directory (fastq_output_path) will be structured as:
fastq_output_path/
├── SRR1234567/
│ ├── SRR1234567_1.fastq
│ └── SRR1234567_2.fastq (if paired-end)
├── SRR2345678/
│ ├── SRR2345678_1.fastq
│ └── SRR2345678_2.fastq (if paired-end)
└── ...
How it works
The script checks for the correct number of input arguments.
It reads the SRR list file line by line.
For each SRR ID:
It checks if the output directory is empty to avoid redundant processing.
It finds the corresponding SRA directory and file.
If the SRA file exists, it runs fasterq-dump to convert it to FASTQ format.
The resulting FASTQ files are saved in the corresponding output subdirectory.
The script continues until all SRR IDs in the list are processed.
Error Handling
The script checks for the correct number of arguments.
It verifies if the SRA directory and file exist for each SRR ID.
It provides informative messages for missing directories or files.
It skips processing if FASTQ files already exist for an SRR ID.
Note
Ensure that your SRA files are organized in subdirectories named after their SRR IDs within the sra_file_path directory.