preprint article : Pathway-enhanced Transformer-based robust model for quantifying cell types of origin of cell-free transcriptome
DOI: 10.1101/2024.02.28.582494
- OS: Linux/UNIX/Windows
- Python Version: >= 3.10.12
- Library:
- torch >= 2.0.0
- scanpy >= 1.9.3
- The root directory of this repository contains all the code for Deconformer.
- The
resourcesubdirectory contains reference data required by Deconformer, as well as code for generating simulated data. - The
model_weightssubdirectory contains pre-trained model files for Deconformer, including models for adults, fetals, and pregnancy stages. - The
inference_resultssubdirectory is where Deconformer saves its inference results. It also includes an example of a Deconformer inference output. - The
example_inputsubdirectory contains an example input file of a cfRNA expression matrix. - The
gene_modelsubdirectory contains the code for Deconformer-gene. This is a variant of Deconformer in which gene embeddings are used instead of pathway embeddings. - The
Analysissubdirectory contains downstream analysis and visualization code based on the inference results of Deconformer, as presented in the article. - The
Dockerfilesubdirectory contains the files used for building the docker image.
To make Deconformer easier to use, we have packaged the model weights, dependencies, and inference scripts into a Docker image and published it on Docker Hub.
if not, follow the installation guide in the official documentation.
docker pull 2303162150/deconformerAssuming you have a gene expression matrix in a tsv file exp.tsv, with rows as genes and columns as samples. When using Docker, a local directory needs to be synchronized with the container. The input expression matrix and the output prediction results should both be in this directory or its subdirectories. Run the following command to create a container and run the Deconformer inference script:
docker run --rm \
-v $workdir:/workspace \
2303162150/deconformer $model_name $exp_tsv $out_tsvAccording to your actual situation, please replace the $workdir $exp_tsv $out_tsv $model_name in the command with a string:
$workdiris the local synchronization working directory, and the paths for$exp_tsvand$out_tsvshould be relative to this directory.$exp_tsvis the tsv file of the expression matrix.$out_tsvis the tsv file of inference result.$model_nameis the name of the trained model. You can choose from the following three models:adult_model60 basic cell types;fetal_model27 types of cells + 3 types of trophoblast cells + 4 types of fetal cells;preg_model60 types of cells + early and late stages of SCT, EVT, VCT, totaling six types of trophoblasts.
If your device supports CUDA, it uses the GPU for inference by default; otherwise, it uses the CPU for inference. Even if you use CPUs of laptop, you can infer about 200 cfRNA samples within 10 minutes.
Inference of cfRNA samples using a pre-trained model does not require a GPU, and about 200 samples can be inferred in 10 minutes on a laptop without a dedicated GPU.
python deconformer_predict.py saved_model_path expression_profile expression_profile: An expression profile of a cfRNA sample inCSVformat, for which you need to infer the origin fractions, with rows as gene names and columns as sample names.saved_model_path: A path for saving pre-trained model parameters and mask matrices (for example, the adult model: ./model_weights/adult_model/ ).
prefix_deconformer_re.txt: Atxtnamed with the prefix of your sample expression profile file followed by '_deconformer_re', where the rows are sample names and the columns are cell type names. It is saved by default in the ./inference_results/ directory.
python deconformer_simulate.pypython deconformer_train.py ann_simulated_data gmt_file project_nameann_simulated_data: AAnnDataobject which saved the simulated cfRNA datagmt_file: Agmtfile that contains pathway information. This study uses GSEA C5 GOBP pathway, and users may choose other pathway data.project_name: Users customize aproject name, and the model parameters and mask matrix trained subsequently will automatically be saved in a folder named after this project.
model_checkpoint_epoch_n.pt: Aptfile that saves the model parameters of the nth epoch.mask_gene_n_pathway_m.txt: A mask matrix created from the givengmtfile, wherenis the number of pathways andmis the number of genes.
