A comprehensive re-analysis of ChIP-seq data from Barutcu et al. (2016) investigating RUNX1's role as a transcriptional regulator in breast cancer cells, implemented as a reproducible Nextflow pipeline.
This project implements a reproducible Nextflow pipeline for ChIP-seq analysis, focusing on RUNX1 transcription factor binding in MCF-7 breast cancer cells. The analysis includes quality control, peak calling, motif discovery, and integration with gene expression data to identify direct RUNX1 targets.
Key Findings:
- Identified 6,462 high-confidence RUNX1 binding sites
- Discovered RUNX1's role as a master regulator coordinating multiple cancer hallmarks
- Validated direct transcriptional regulation of key breast cancer genes
- Revealed extensive involvement in cell cycle, DNA damage response, and metabolic pathways
- Nextflow >= 22.10.0
- Singularity/Apptainer or Docker
- Python 3.9 or higher
git clone https://github.com/yourusername/chipseq-runx1-analysis.git
cd chipseq-runx1-analysisnextflow run main.nf -profile singularity --samplesheet samplesheet.csvchipseq-runx1-analysis/
├── envs # Conda environments
│ └── base_env.yml # Environment specification
├── modules/ # Custom Nextflow modules
│ ├── bedtools_intersect/
│ ├── bedtools_remove/
│ ├── bowtie2_align/
│ ├── bowtie2_build/
│ ├── deeptools_bamcoverage/
│ ├── deeptools_computematrix/
│ ├── deeptools_multibwsummary/
│ ├── deeptools_plotcorrelation/
│ ├── deeptools_plotprofile/
│ ├── fastqc/
│ ├── homer_annotatepeaks/
│ ├── homer_findmotifsgenome/
│ ├── homer_findpeaks/
│ ├── homer_maketagdir/
│ ├── homer_pos2bed/
│ ├── multiqc/
│ ├── samtools_flagstat/
│ ├── samtools_index/
│ ├── samtools_sort/
│ └── trimmomatic/
├── results/ # Analysis outputs
│ ├── ### RAW DATA & INTERMEDIATE FILES ###
│ ├── GSE75070_MCF7_shRUNX1_shNS_RNAseq_log2_foldchange.txt # Gene expression data
│ ├── IP_rep1_peaks.bed # Peak calls from replicate 1
│ ├── IP_rep2_peaks.bed # Peak calls from replicate 2
│ ├── genes_for_enrichr.txt # Gene list for enrichment analysis
│ ├── correlation_matrix.tab # Sample correlation data
│ ├── ### QUALITY CONTROL & VALIDATION ###
│ ├── supp_figure_S2A_table.png # Alignment statistics comparison
│ ├── supp_figure_S2B_summary.png # QC metrics summary
│ ├── supp_figure_S2C_venn.png # Peak overlap between replicates
│ ├── correlation_heatmap.png # Sample correlation visualization
│ ├── ### GENOMIC VISUALIZATIONS ###
│ ├── figure_2D_MALAT1.png # Genome browser view - MALAT1 promoter
│ ├── figure_2E_PIDD1.png # Genome browser view - PIDD1 promoter
│ ├── profile_plot.png # Signal coverage across gene bodies
│ ├── ### MOTIF & BINDING ANALYSIS ###
│ ├── known_homer_motif.png # Top enriched motifs from HOMER
│ ├── ### EXPRESSION INTEGRATION ###
│ ├── figure_2F_accurate.png # ChIP-seq + RNA-seq integration plot
│ ├── ### FUNCTIONAL ENRICHMENT ###
│ ├── MSigDB_Hallmark_2020_bar_graph.png # Cancer hallmark pathways
│ ├── GO_Biological_Process_2025_bar_graph.png # Gene ontology terms
│ ├── KEGG_2021_Human_bar_graph.png # KEGG pathway enrichment
│ └── enrichment_top_pathways_figure.png # Comprehensive pathway summary
├── scripts/
│ ├── AlignmentStatistics.py
│ ├── AnnotatePeaksAnalysis.py
│ ├── DisplayENRICHRresults.py
│ ├── OverlapChIPvsRNA.py
│ ├── PeakOverlap.py
│ ├── PeaksStatistics.py
│ └── SpearmanCorrelation.py
└── workflows/
│ ├── main.nf # Main Nextflow pipeline
│ └── nextflow.config # Nextflow configuration
- Quality Control & Preprocessing
- FastQC: Quality assessment of raw and trimmed reads
- Trimmomatic: Adapter trimming and quality filtering
- MultiQC: Aggregated QC report generation
- Genome Alignment
- Bowtie2: Alignment to hg38 reference genome
- SAMtools: BAM processing and statistics
- Peak Calling & Analysis
- HOMER: Peak calling with factor-style parameters
- BEDTools: Peak intersection and blacklist filtering
- deepTools: Correlation analysis and signal visualization
- Motif & Functional Analysis
- HOMER: De novo motif discovery and enrichment
- GREAT/Enrichr: Gene ontology and pathway analysis
- IGV: Genomic visualization of specific loci
| Sample | Total Reads (M) | Mapping Rate | Duplication Rate | GC Content |
|---|---|---|---|---|
| INPUT_rep1 | ~30 | 89.1% | 1-4% | 43-47% |
| INPUT_rep2 | ~10.7 | 74.3% | 1-4% | 43-47% |
| IP_rep1 | ~30 | 89.1% | 12-13% | 43-47% |
| IP_rep2 | ~29 | 74.3% | 12-13% | 43-47% |
- Promoter enrichment: Strong binding at transcription start sites
- Motif validation: Canonical RUNX motif highly enriched (p < 1e-50)
- Target genes: Direct regulation of 14.7-18.7% of differentially expressed genes
Top Pathways Regulated by RUNX1:
- E2F Targets & Cell Cycle Control
- DNA Damage Response & Repair
- Metabolic Reprogramming (Glycolysis, TCA Cycle)
- Estrogen Response Pathways
- mTORC1 Signaling
The pipeline generates comprehensive visualizations including:
- Signal coverage profiles across gene bodies
- Correlation heatmaps between replicates
- Motif enrichment plots
- Genome browser tracks for specific loci
- Pathway enrichment bar graphs
Profiles
- singularity: Uses Singularity containers for reproducibility
- docker: Uses Docker containers
- local: Local execution with Conda/Mamba Parameters Key customizable parameters in nextflow.config:
- genome: Reference genome path
- blacklist: ENCODE blacklist regions
- gtf: Gene annotation file
- ucsc_genes: Gene coordinates for profiling
Barutcu, A. R., Hong, D., Lajoie, B. R., McCord, R. P., van Wijnen, A. J., Lian, J. B., Stein, J. L., Dekker, J., Imbalzano, A. N., & Stein, G. S. (2016). RUNX1 contributes to higher-order chromatin organization and gene regulation in breast cancer cells. Biochimica et biophysica acta, 1859(11), 1389–1397. https://doi.org/10.1016/j.bbagrm.2016.08.003
@article{barutcu2016,
title={RUNX1 contributes to higher-order chromatin organization and gene regulation in breast cancer cells},
author={Barutcu, A Rasim and Hong, Deli and Lajoie, Bryan R and McCord, Rachel P and van Wijnen, Andre J and Lian, Jane B and Stein, Janet L and Dekker, Job and Imbalzano, Anthony N and Stein, Gary S},
journal={Biochimica et Biophysica Acta (BBA)-Gene Regulatory Mechanisms},
volume={1859},
number={12},
pages={1389--1397},
year={2016},
publisher={Elsevier}
}