This repository contains code for the paper SCMBench: Benchmarking Domain-specific and Foundation Models for Single-cell Multi-omics Data Integration.
.
├── SCMBench # Main Python package
├── data # Data files
├── evaluation # Method evaluation pipelines
├── methods # Tools included in the benchmarking
├── environments # Reproducible Python or R environment
├── pyproject.toml # Python package metadata
├── LICENSE
└── README.md
git clone https://github.com/Susanxuan/SCMBench.git
- We have summarized Python/R version and packages needed for each method in environments, users can
bash [env]_installation.shfiles to install requirements for each method accordingly.
| Virtual Envs | Python/R Version | Methods |
|---|---|---|
| env1_installation.sh | python=3.8 | cobolt;GLUE;Harmony;MMD-MA;MOFA;Pamona;PCA; scJoint;scMDC;scMoMaT;UCE;UnionCom |
| env2_installation.sh | python=3.11 | Geneformer;scFoundation;scVI;TotalVI |
| env3_installation.sh | python=3.10 | scGPT |
| env4_installation.sh | R=4.3.3 | bindSC;Deepmaps;iNMF;liger;Seurat4;Seurat5 |
-
Data Preprocessing Follow this.
-
Run python scrips. E.g., scJoint.:
cd methods/scJoint
mkdir ../results/scJoint-output
python run_scJoint.py --input-rna ../data/download/10x-Multiome-Pbmc10k-small-RNA.h5ad --input-atac 10x-Multiome-Pbmc10k-small-ATAC.h5ad --output-rna ../results/scJoint-output/rna.csv --output-atac ../results/scJoint-output/atac.csv -r ../results/scJoint-output/run_info.yaml- Run R script. E.g., Deepmaps.
Tested approach: install R and related packages with in conda (e.g. this).
cd methods/Deepmaps
mkdir ../results/Deepmaps-output
Rscript run_Deepmaps.R --input-rna ../data/download/10x-Multiome-Pbmc10k-small-RNA.h5ad --input-atac 10x-Multiome-Pbmc10k-small-ATAC.h5ad --output-rna ../results/Deepmaps-output/rna.csv --output-atac ../results/Deepmaps-output/atac.csv -r ../results/Deepmaps-output/run_info.yamlPreprocess all the data to the same .h5ad format with the same keys following here.
For new datasets we include, upload preprocessing scripts and update the preprocessed data/link here (store the preprocessed data somewhere we can directly download).
Current included datasets:
- ├─ 10x-Multiome-Pbmc10k
├─ Chen-2019
├─ Ma-2020
├─ Muto-2021
├─ Yao-2021
└─ Triple
Note: Most tools can be directly used by installing it individually via pip install if they provide that option. But for scJoint, and UCE, we provide edited package in methods/ used for their corresponding scripts run_[METHOD].py.
Current included algorithms:
Paired:
Unpaired:
Paired:
Unpaired:
For downstream evaluation, please read evaluation/README.md.
Currently included downstream tasks:
- Multi-Omic Integration Accuracy (MAP, MNI, ASW, ARI)
- Bio-Conservation:
- Biomarker Detection (JSI)
- DARs Detection (JSI)
- Enriched Motifs Detection (JSI)
- Trajectory Conservation
- Batch effect & detecting over-correction (iLISI, kBET, Graph connectivity, Batch-ASW)
- Bedtools.
NotImplementedError: "intersectBed" does not appear to be installed or on the path, so this method is disabled.. Download bedtools binary from official website from here and put it in the conda bin directory. - GPU support for torchbiggraph. Check this.
The online server of this package is underdevelopment, which will be freely available without any registration requirement.