Source code for "An Equivariant Pretrained Transformer for Unified 3D Molecular Representation Learning".
One can setup the environment via env.yaml as
conda env create -f env.yml
conda activate EPTAssets for downloading pretraining datasets are listed as follows.
One can preprocess the above raw data into LMDB format via the following scripts.
# GEOM
python -m scripts.process_data.process_GEOM \
--base_path <rdkit_dir> --dataset qm9 \
--out_dir ./processed/GEOM --using_hrdrogen
python -m scripts.process_data.process_GEOM \
--base_path <rdkit_dir> --dataset drugs \
--out_dir ./processed/GEOM --using_hrdrogen
# PCQM4Mv2
python -m scripts.process_data.process_GEOM \
--sdf_file <sdf_dir> \
--out_dir ./processed/PCQM4M-v2 --using_hrdrogen
# PDB
python -m scripts.process_data.process_PDB_monomer \
--pdb_dir <pdb_dir> \
--out_dir ./processed/PDB
# PDBBind
python -m scripts.process_data.process_PDBBind \
--data_dir <data_dir> \
--out_dir ./processed/PDBBind- LBA One can access the raw data of LBA via this link and process the data by
python -m scripts.process_data.process_LBA.py \
--base_path <splits_dir>
--out_dir ./processed/LBA/<id>- MSP One can access the raw data of MSP via this link and process the data by
python -m scripts.process_data.process_MSP \
--base_path <splits_dir>
--out_dir ./processed/MSP- MPP One can acquire the processed QM9 data via the following script. Raw data will be downloaded automatedly.
python -m scripts.process_data.process_QM9.py \
--out_dir ./processed/QM9
--using_hydrogen --downloadAll preprocessed datasets can be acquired from this link.
GPU=0,1,2,3,4,5,6,7 bash execute/pretrain.sh configs/PreTrain/pretrain.yamlThe pretrained checkpoint is available at this google drive.
# LBA
GPU=0 SPLIT=<ID30 or ID60> bash execute/finetune_lba.sh ./configs/LBA <pretrained_ckpt>
# MSP
GPU=0 bash execute/finetune_msp.sh ./configs/MSP <pretrained_ckpt>
# MPP
GPU=0 PROP=<property (e.g. homo)> bash execute/finetune_qm9.sh ./configs/QM9 <pretrained_ckpt># LBA
python -m evaluation.eval_lba --config ./configs/LBA/<split>/test.yaml --ckpt ./ckpts/LBA/<split> --gpu 0
# MSP
python -m evaluation.eval_msp --config ./configs/MSP --ckpt ./ckpts/MSP --gpu 0
# MPP
python -m evaluation.eval_qm9 --config ./configs/QM9/test.yaml --ckpt ./ckpts/QM9 --gpu 0 dataset.train.property=<prop> dataset.test.property=<prop>