ChemGLaM is a large language model for Compound-Protein Interaction Prediction.
.
├── cache : Directory for protein embeddings
├── chamglam : Source code for ChemGLaM
├── config : Config examples
├── data : Directory for datasets
├── figures : Figures for README.md
├── logs : Output Directory
├── script : Script examples
├── Dockerfile : Docker file
├── LICENSE : License file
├── predict.py : main script for prediction
├── prediction_demo.ipynb: example for inference one pair
├── README.md : This file
├── setup.py : setup file
└── train.py : main script for fine-tuning
To set up the environment for this project, follow these steps:
- Create a new conda environment named
chemglamwith Python 3.11:conda create -n chemglam -y python=3.11
- Activate the environment:
conda activate chemglam
- Install the required dependencies:
pip install -e . - Install RDKit (version 2024.9.2) from the conda-forge channel:
conda install -c conda-forge rdkit=2024.9.2
The following is the example of the command for finetuning ChemGLaM with CPI datasets.
python train.py -c /path/to/config.jsonBy setting the argument "evidential" as true in a config file, you can run EDL for classification tasks.
python train.py -c /path/to/edl_config.jsonComing Soon.
You can use the inference script by specifying the --checkpoint_path with the path to the finetuned model and setting deterministic_eval: true for reproducible results.
python predict.py -c /path/to/inference_config.jsonYou can also run the script with Docker. First, build the Docker image with the following command.
cd ChemGLaM_huggingface
docker build --no-cache -t chemglam .After building the Docker image, you can run the script with the following command. You can replace the python train.py -c config/benchmark/bindingdb_cv0.json with the script you want to run. Here is an example of how to execute the training script for a specific configuration:
docker run --gpus all -it --rm -u `id -u`:`id -g` \
-v $(pwd):/workspace chemglam \
python train.py -c config/benchmark/bindingdb_cv0.json- Takuto Koyama: koyama.takuto.82j[at]st.kyoto-u.ac.jp
bioRxiv
@article{koyama2024chemglam,
title={ChemGLaM: Chemical Genomics Language Models for Compound-Protein Interaction Prediction},
author={Koyama, Takuto and Tsumura, Hayato and Matsumoto, Shigeyuki and Okita, Ryunosuke and Kojima, Ryosuke and Okuno, Yasushi},
journal={bioRxiv},
pages={2024--02},
year={2024},
publisher={Cold Spring Harbor Laboratory}
}
