seoklab / SHARP Public

Notifications You must be signed in to change notification settings
Fork 1
Star 2

Generating synthesizable molecules via fragment-based hierarchical action-space reinforcement learning for pareto optimization

2 stars 1 fork Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
script		script
src/sharp		src/sharp
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
setup.py		setup.py

Repository files navigation

SHARP : Generating synthesizable molecules via fragment-based hierarchical action-space reinforcement learning for pareto optimization

This repository represetns a novel molecule generative model aiming pareto optimization that leverages fragment-based hierarchical action-space reinforcement learning. Technical details and thorough analysis can be found in our paper, SHARP : Generating synthesizable molecules via fragment-based hierarchical action-space reinforcement learning for pareto optimization written by Jeonghyeon Kim, Seongok Ryu, Hahnbeom Park and Chaok Seok. If you have any question, feel free to open an issue or reach out at jeonghyeonkim86@gmail.com

##Installation guide(linux only)

Install [Anaconda] (https://www.anaconda.com/download) if you have not installed it yet.
Installation can be done by running below commands in terminal from main directory location. After git clone, below commands should be run in terminal from main directory location.
Clone this repository

$ git clone https://github.com/seoklab/SHARP.git

Create a conda environment using following commands

conda env create -f environment.yaml
conda activate sharp

Download the parameters and fragment libraries from google drive and place it in the main directory location. Then, change the folder name into "data/".
Install the Vina-GPU 2.1. You should carefully follow the guideline. Compile it carefully and update the "opencl_binary_path" in the docking config.

Total installation would take ~10min for normal computer.

Usage guide

Below commands should be run in terminal from main directory location with CUDA enabled environment. Average runtime for generating 100 molecules is ~30min.

Input preparation

Since we use autodockVina-gpu, you need to prepare the input of pdbqt file of input protein and config file. Check out the parameter path in the config.py and each json file also.

De novo generation

You can run with following commands.

python script/sample.py --config_file configs/denovo.json

The variables you need to change in the json is as following.

output_csv: Path to save the generated molecules.
num_molecules: The number of molecules to generate.
model_path: Path to the pretrained model weights.
receptor_pdb: Path to the receptor PDB file for docking.
reference_pdb: Path to the reference PDB file for reward calculation.
reference_ligand_name: Name of the reference ligand in the reference PDB file.
max_action: The maximum number of generation steps.

Fragment growing

You can run with following commands.

python script/sample.py --config_file configs/frag.json

The variables you need to change in the json is as following.

output_csv: Path to save the generated molecules.
num_molecules: The number of molecules to generate.
model_path: Path to the pretrained model weights.
scaffold_smiles: A list containing the SMILES string of the scaffold to grow from. The attachment point should be marked with [*].
receptor_pdb: Path to the receptor PDB file for docking.
reference_pdb: Path to the reference PDB file for reward calculation.
reference_ligand_name: Name of the reference ligand in the reference PDB file.
max_action: The maximum number of generation steps.

Linker design

You can run with following commands.

python script/sample.py --config_file configs/linker.json

The variables you need to change in the json is as following.

output_csv: Path to save the generated molecules.
num_molecules: The number of molecules to generate.
model_path: Path to the pretrained model weights.
scaffold_smiles: A list of two SMILES strings for the fragments to be connected. The attachment points should be marked with [*].
receptor_pdb: Path to the receptor PDB file for docking.
reference_pdb: Path to the reference PDB file for reward calculation.
reference_ligand_name: Name of the reference ligand in the reference PDB file.

Scaffold hopping

You can run with following commands.

python script/sample.py --config_file configs/scaffold.json

The variables you need to change in the json is as following.

output_csv: Path to save the generated molecules.
num_molecules: The number of molecules to generate.
model_path: Path to the pretrained model weights.
scaffold_smiles: A list containing the SMILES string of the scaffold to start from.
receptor_pdb: Path to the receptor PDB file for docking.
reference_pdb: Path to the reference PDB file for reward calculation.
reference_ligand_name: Name of the reference ligand in the reference PDB file.
max_action: The maximum number of generation steps.

Sidechain deocration

You can run with following commands.

python script/sample.py --config_file configs/sidechain.json

The variables you need to change in the json is as following.

output_csv: Path to save the generated molecules.
num_molecules: The number of molecules to generate.
model_path: Path to the pretrained model weights.
scaffold_smiles: A list containing the SMILES string of the scaffold to decorate. The attachment points should be marked with [*].
receptor_pdb: Path to the receptor PDB file for docking.
reference_pdb: Path to the reference PDB file for reward calculation.
reference_ligand_name: Name of the reference ligand in the reference PDB file.
max_action: The maximum number of generation steps.

License

All code including weight of neural network is licensed under the MIT license.

About

Generating synthesizable molecules via fragment-based hierarchical action-space reinforcement learning for pareto optimization

Custom properties

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%