Skip to content

ml4bio/SCMBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SCMBench

This repository contains code for the paper SCMBench: Benchmarking Domain-specific and Foundation Models for Single-cell Multi-omics Data Integration.

Directory structure

.
├── SCMBench                # Main Python package
├── data                    # Data files
├── evaluation              # Method evaluation pipelines
├── methods                 # Tools included in the benchmarking
├── environments            # Reproducible Python or R environment
├── pyproject.toml          # Python package metadata
├── LICENSE
└── README.md

Installation

1. Clone our repository

git clone https://github.com/Susanxuan/SCMBench.git

2. Set up for each method

  • We have summarized Python/R version and packages needed for each method in environments, users can bash [env]_installation.sh files to install requirements for each method accordingly.
Virtual Envs Python/R Version Methods
env1_installation.sh python=3.8 cobolt;GLUE;Harmony;MMD-MA;MOFA;Pamona;PCA;
scJoint;scMDC;scMoMaT;UCE;UnionCom
env2_installation.sh python=3.11 Geneformer;scFoundation;scVI;TotalVI
env3_installation.sh python=3.10 scGPT
env4_installation.sh R=4.3.3 bindSC;Deepmaps;iNMF;liger;Seurat4;Seurat5

Quick start

  1. Data Preprocessing Follow this.

  2. Run python scrips. E.g., scJoint.:

cd methods/scJoint
mkdir ../results/scJoint-output
python run_scJoint.py --input-rna ../data/download/10x-Multiome-Pbmc10k-small-RNA.h5ad --input-atac  10x-Multiome-Pbmc10k-small-ATAC.h5ad  --output-rna ../results/scJoint-output/rna.csv --output-atac ../results/scJoint-output/atac.csv  -r ../results/scJoint-output/run_info.yaml
  1. Run R script. E.g., Deepmaps.
    Tested approach: install R and related packages with in conda (e.g. this).
cd methods/Deepmaps
mkdir ../results/Deepmaps-output
Rscript run_Deepmaps.R --input-rna ../data/download/10x-Multiome-Pbmc10k-small-RNA.h5ad --input-atac  10x-Multiome-Pbmc10k-small-ATAC.h5ad  --output-rna ../results/Deepmaps-output/rna.csv --output-atac ../results/Deepmaps-output/atac.csv  -r ../results/Deepmaps-output/run_info.yaml

Data preprocessing

Preprocess all the data to the same .h5ad format with the same keys following here.

For new datasets we include, upload preprocessing scripts and update the preprocessed data/link here (store the preprocessed data somewhere we can directly download).

Current included datasets:

  • ├─ 10x-Multiome-Pbmc10k
    ├─ Chen-2019
    ├─ Ma-2020
    ├─ Muto-2021
    ├─ Yao-2021
    └─ Triple

Algorithms

Note: Most tools can be directly used by installing it individually via pip install if they provide that option. But for scJoint, and UCE, we provide edited package in methods/ used for their corresponding scripts run_[METHOD].py.

Current included algorithms:

Statistical-Based:

Paired:

Unpaired:

Deep Learning-based:

Paired:

Unpaired:

Foundation Models:

Downstream tasks

For downstream evaluation, please read evaluation/README.md.

Currently included downstream tasks:

  • Multi-Omic Integration Accuracy (MAP, MNI, ASW, ARI)
  • Bio-Conservation:
    • Biomarker Detection (JSI)
    • DARs Detection (JSI)
    • Enriched Motifs Detection (JSI)
    • Trajectory Conservation
  • Batch effect & detecting over-correction (iLISI, kBET, Graph connectivity, Batch-ASW)

Solution to potential install issues

  • Bedtools. NotImplementedError: "intersectBed" does not appear to be installed or on the path, so this method is disabled.. Download bedtools binary from official website from here and put it in the conda bin directory.
  • GPU support for torchbiggraph. Check this.

Online Version - SCMBench server

The online server of this package is underdevelopment, which will be freely available without any registration requirement.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published