Skip to content

jiaor17/EPT

Repository files navigation

An Equivariant Pretrained Transformer for Unified 3D Molecular Representation Learning

Source code for "An Equivariant Pretrained Transformer for Unified 3D Molecular Representation Learning".

Setup

One can setup the environment via env.yaml as

conda env create -f env.yml
conda activate EPT

Download and Preprocess Data

Pretraining Data

Assets for downloading pretraining datasets are listed as follows.

One can preprocess the above raw data into LMDB format via the following scripts.

# GEOM
python -m scripts.process_data.process_GEOM \
    --base_path <rdkit_dir> --dataset qm9 \
    --out_dir ./processed/GEOM --using_hrdrogen
python -m scripts.process_data.process_GEOM \
    --base_path <rdkit_dir> --dataset drugs \
    --out_dir ./processed/GEOM --using_hrdrogen
# PCQM4Mv2
python -m scripts.process_data.process_GEOM \
    --sdf_file <sdf_dir> \
    --out_dir ./processed/PCQM4M-v2 --using_hrdrogen
# PDB
python -m scripts.process_data.process_PDB_monomer \
	--pdb_dir <pdb_dir> \
	--out_dir ./processed/PDB
# PDBBind
python -m scripts.process_data.process_PDBBind \
	--data_dir <data_dir> \
	--out_dir ./processed/PDBBind

Downstream Data

  • LBA One can access the raw data of LBA via this link and process the data by
python -m scripts.process_data.process_LBA.py \
	--base_path <splits_dir>
	--out_dir ./processed/LBA/<id>
  • MSP One can access the raw data of MSP via this link and process the data by
python -m scripts.process_data.process_MSP \
	--base_path <splits_dir>
	--out_dir ./processed/MSP
  • MPP One can acquire the processed QM9 data via the following script. Raw data will be downloaded automatedly.
python -m scripts.process_data.process_QM9.py \
	--out_dir ./processed/QM9
	--using_hydrogen --download

All preprocessed datasets can be acquired from this link.

Pretrain on Multi-Domain Dataset

GPU=0,1,2,3,4,5,6,7 bash execute/pretrain.sh configs/PreTrain/pretrain.yaml

The pretrained checkpoint is available at this google drive.

Finetune on Downstream Tasks

# LBA
GPU=0 SPLIT=<ID30 or ID60> bash execute/finetune_lba.sh ./configs/LBA <pretrained_ckpt>
# MSP
GPU=0 bash execute/finetune_msp.sh ./configs/MSP <pretrained_ckpt>
# MPP
GPU=0 PROP=<property (e.g. homo)> bash execute/finetune_qm9.sh ./configs/QM9 <pretrained_ckpt>

Evaluation on Downstream Tasks

# LBA
python -m evaluation.eval_lba --config ./configs/LBA/<split>/test.yaml --ckpt ./ckpts/LBA/<split> --gpu 0
# MSP
python -m evaluation.eval_msp --config ./configs/MSP --ckpt ./ckpts/MSP --gpu 0
# MPP
python -m evaluation.eval_qm9 --config ./configs/QM9/test.yaml --ckpt ./ckpts/QM9 --gpu 0 dataset.train.property=<prop> dataset.test.property=<prop>

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published