Hi,
Thank you for sharing your great work and open-source code!
I have a question regarding the data preprocessing pipeline used to generate the model inputs — specifically, the separate protein.pdb and ligand.mol2 files.
In the released version of your model, the input format expects these two files independently, but most datasets (e.g., PDBBind, PubChem, or RCSB entries) provide only complex structures where the protein, ligand, and sometimes cofactors are combined in a single .pdb file.
It would be extremely helpful if you could share the script or function that performs this separation.
Thank you very much!