Skip to content

TemplateGenerator._match_residue() is too slow #426

@epretti

Description

@epretti

The residue matching inside template generators used to load or check force field files uses NetworkX. This is alright for small molecules, but performance is poorer for larger molecules. I have been playing with this and found it to be very easy to construct pathological cases that lead to unacceptable performance, e.g., NetworkX can take over a minute to distinguish between graphs for the 119-atom peptide DPETGTWG (chignolin[2-9]) and the same with the two final residues swapped in place.1

Per a recent discussion, if we end up using single residue templates for biopolymers and deferring to SystemGenerator to handle the fact that OpenMM normally reads them as multi-residue chains, we will need to match large molecules. Even if not, we should be able to handle small cases like the above much faster than currently, so I'm opening this issue since I'd like to address this at some point in the future.

Footnotes

  1. If you try to reproduce this and can't, note that it's oddly sensitive to the ordering of the atoms in the graphs you are trying to match, so it might or might not appear depending on how you construct the molecule.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions