- Author: Anne-Sophie Denommé-Pichon
- Version: 0.0.1
- Licence: AGPLv3
- Description: this pipeline allows to get STR genotype from short-read genomes on the locus specified. It uses ExpansionHunter, Tredparse and GangSTR. It computes genotypes called by the tools and identifies STR expansions using 3 outlier detection methods to highlight abnormal repeat counts.
- Fill the configuration file
config.sh. There is an example in the repository. - Create
samples.list(bam file names without .bam). There is an example in the repository.
For now, scripts have to be launched from the clone directory.
Launch launch_pipeline.sh:
nohup ./launch_pipeline.sh samples.list &Dependencies:
config.shsamples.listpipeline.shwrapper_delete.shwrapper_ehdn.shwrapper_expansionhunter.shwrapper_gangstr.shwrapper_transfer.shwrapper_tredparse.sh
To highlight abnormal repeat counts, the pipeline identified outliers using 3 methods:
- repeats counts at a given locus > normal (in the gray zone or pathological zone)
- repeats counts at a given locus > 99th percentile or
- repeats counts at a given locus ≥ 4 standard deviations above the mean (Z-score ≥ 4).
Launch launch_results.sh:
nohup ./launch_results.sh samples.list &Dependencies:
config.shsamples.listpatho.csvgetResults.pylaunch_str_outliers.shstr_outliers.py
Another tool, ExpansionHunter DeNovo, will be added in the pipeline.