SHARP : Generating synthesizable molecules via fragment-based hierarchical action-space reinforcement learning for pareto optimization
This repository represetns a novel molecule generative model aiming pareto optimization that leverages fragment-based hierarchical action-space reinforcement learning. Technical details and thorough analysis can be found in our paper, SHARP : Generating synthesizable molecules via fragment-based hierarchical action-space reinforcement learning for pareto optimization written by Jeonghyeon Kim, Seongok Ryu, Hahnbeom Park and Chaok Seok. If you have any question, feel free to open an issue or reach out at jeonghyeonkim86@gmail.com
##Installation guide(linux only)
- Install [Anaconda] (https://www.anaconda.com/download) if you have not installed it yet.
- Installation can be done by running below commands in terminal from main directory location. After git clone, below commands should be run in terminal from main directory location.
- Clone this repository
$ git clone https://github.com/seoklab/SHARP.git
- Create a conda environment using following commands
conda env create -f environment.yaml
conda activate sharp
-
Download the parameters and fragment libraries from google drive and place it in the main directory location. Then, change the folder name into "data/".
-
Install the Vina-GPU 2.1. You should carefully follow the guideline. Compile it carefully and update the "opencl_binary_path" in the docking config.
Total installation would take ~10min for normal computer.
Below commands should be run in terminal from main directory location with CUDA enabled environment. Average runtime for generating 100 molecules is ~30min.
Since we use autodockVina-gpu, you need to prepare the input of pdbqt file of input protein and config file. Check out the parameter path in the config.py and each json file also.
You can run with following commands.
python script/sample.py --config_file configs/denovo.json
The variables you need to change in the json is as following.
output_csv: Path to save the generated molecules.num_molecules: The number of molecules to generate.model_path: Path to the pretrained model weights.receptor_pdb: Path to the receptor PDB file for docking.reference_pdb: Path to the reference PDB file for reward calculation.reference_ligand_name: Name of the reference ligand in the reference PDB file.max_action: The maximum number of generation steps.
You can run with following commands.
python script/sample.py --config_file configs/frag.json
The variables you need to change in the json is as following.
output_csv: Path to save the generated molecules.num_molecules: The number of molecules to generate.model_path: Path to the pretrained model weights.scaffold_smiles: A list containing the SMILES string of the scaffold to grow from. The attachment point should be marked with[*].receptor_pdb: Path to the receptor PDB file for docking.reference_pdb: Path to the reference PDB file for reward calculation.reference_ligand_name: Name of the reference ligand in the reference PDB file.max_action: The maximum number of generation steps.
You can run with following commands.
python script/sample.py --config_file configs/linker.json
The variables you need to change in the json is as following.
output_csv: Path to save the generated molecules.num_molecules: The number of molecules to generate.model_path: Path to the pretrained model weights.scaffold_smiles: A list of two SMILES strings for the fragments to be connected. The attachment points should be marked with[*].receptor_pdb: Path to the receptor PDB file for docking.reference_pdb: Path to the reference PDB file for reward calculation.reference_ligand_name: Name of the reference ligand in the reference PDB file.
You can run with following commands.
python script/sample.py --config_file configs/scaffold.json
The variables you need to change in the json is as following.
output_csv: Path to save the generated molecules.num_molecules: The number of molecules to generate.model_path: Path to the pretrained model weights.scaffold_smiles: A list containing the SMILES string of the scaffold to start from.receptor_pdb: Path to the receptor PDB file for docking.reference_pdb: Path to the reference PDB file for reward calculation.reference_ligand_name: Name of the reference ligand in the reference PDB file.max_action: The maximum number of generation steps.
You can run with following commands.
python script/sample.py --config_file configs/sidechain.json
The variables you need to change in the json is as following.
output_csv: Path to save the generated molecules.num_molecules: The number of molecules to generate.model_path: Path to the pretrained model weights.scaffold_smiles: A list containing the SMILES string of the scaffold to decorate. The attachment points should be marked with[*].receptor_pdb: Path to the receptor PDB file for docking.reference_pdb: Path to the reference PDB file for reward calculation.reference_ligand_name: Name of the reference ligand in the reference PDB file.max_action: The maximum number of generation steps.
All code including weight of neural network is licensed under the MIT license.