ChemFG-Tool: The functional group identification toolkits for ChemDFM-R

ChemFG-Tool is a lightweight functional group identification toolkit developed and used during the development of ChemDFM-R (the paper of ChemDFM-R could be found here). It provides utilities for identifying functional groups in both molecules and chemical reactions, with an emphasis on broad coverage, reduced overlap, and better discrimination of complex composite functional groups.

This toolkit is built upon the functional group identification utilities provided by the Python package thermo. While the original thermo implementation supports 83 functional groups, its coverage is limited and some functional groups may be identified with overlapping or ambiguous matches, making it difficult to reliably distinguish complex composite functional groups.

Through careful analysis and redesign of functional group definitions and matching logic, ChemFG-Tool largely avoids these issues and achieves a more comprehensive, precise, and structure-aware functional group identification. Specifically, compared to thermo, ChemFG-Tool:

Expands the supported functional group set from 83 to 241
Reduces cross-over and redundant identifications and explicitly distinguishes composite functional groups from their simpler constituents
Provides functional group localization, including atom-level positional information

Repository Structure

`functional_group_list.tsv`

A tab-separated file defining all supported functional groups, including:

Class: Chemical category of the functional group
Function Name: Python function used for identification
Functional Group Name: Human-readable name

`get_functional_group.py`

The core implementation of functional group identification.

Usage Examples

The following examples illustrate typical usage.

Example 1: Molecular Functional Group Identification

from rdkit import Chem
import get_functional_group as fg

with open('./functional_group_list.tsv', 'r') as f:
    rules = [row.strip().split('\t') for row in f][1:]

# Placeholder for molecule input (e.g., SMILES or RDKit Mol)
molecule = Chem.MolFromSmiles("<MOLECULE_PLACEHOLDER>")

# Output includes both functional group names and their positions
functional_groups = {}
for group, function_name, name in rules:
    places = eval(f'fg.{function_name}')(molecule)
    if places:
        functional_groups[name] = places

print(functional_groups)

Example output (conceptual):

{
    "alkene/olefin": ((0, 1),),
    "hydrazone": ((7, 8, 9),),
    "thiolester": ((3, 4, 5),)
}

Example 2: Reaction Functional Group Identification

from rdkit import Chem
from rdkit.Chem import AllChem
import get_functional_group as fg

with open('./functional_group_list.tsv', 'r') as f:
    rules = [row.strip().split('\t') for row in f][1:]
    rules = [rule for rule in rules if rule[1] != 'is_alkane']

def get_fg(molecule, center):
    functional_groups = []
    Chem.SanitizeMol(molecule)
    for group, function_name, name in rules:
        places = eval(f'fg.{function_name}')(molecule)
        if any([atom in place for atom in center for place in places]):
            functional_groups.append(name)
    return functional_groups

reaction = {
    "reaction_smiles": "[CH2:1]1[O:35][CH2:34][CH2:33][CH2:36]1.[CH:2]([C:3]#[N:4])([CH3:5])[S:12](=[O:13])(=[O:14])[CH2:15][CH2:16][C:17]([F:18])([F:19])[F:20].[CH:6](=[CH:7][C:8]([F:9])([F:10])[F:11])[CH2:30][O:29][S:26]([c:25]1[cH:24][cH:23][c:22]([CH3:21])[cH:32][cH:31]1)(=[O:27])=[O:28]>>[CH3:1][C:2]([C:3]#[N:4])([CH2:5][CH:6]=[CH:7][C:8]([F:9])([F:10])[F:11])[S:12](=[O:13])(=[O:14])[CH2:15][CH2:16][C:17]([F:18])([F:19])[F:20]",
    "reaction_center": [1, 2, 5, 6]
}
rxn = AllChem.ReactionFromSmarts(reaction["reaction_smiles"])
reactants = rxn.GetReactants()
products = rxn.GetProducts()

reactants_fg = [get_fg(r, reaction["reaction_center"]) for r in reactants]
products_fg = [get_fg(p, reaction["reaction_center"]) for p in products]

print({"reactants": reactants_fg, "products": products_fg})

Example output:

{
    "reactants": [
        ["ether"],
        ["nitrile", "sulfone"],
        ["alkene/olefin", "fluoro"]
    ],
    "products": [
        ["alkene/olefin", "nitrile"]
    ]
}

Citation

If you use ChemFG-Tool in your research, please cite the ChemDFM-R paper:

@misc{zhao2025chemdfmrchemicalreasoningllm,
      title={ChemDFM-R: A Chemical Reasoning LLM Enhanced with Atomized Chemical Knowledge}, 
      author={Zihan Zhao and Bo Chen and Ziping Wan and Lu Chen and Xuanze Lin and Shiyang Yu and Situo Zhang and Da Ma and Zichen Zhu and Danyang Zhang and Huayang Wang and Zhongyang Dai and Liyang Wen and Xin Chen and Kai Yu},
      year={2025},
      eprint={2507.21990},
      archivePrefix={arXiv},
      primaryClass={cs.CE},
      url={https://arxiv.org/abs/2507.21990}, 
}

Contact

If you have any questions or further requests, please contact Zihan Zhao and Lu Chen.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
functional_group_list.tsv		functional_group_list.tsv
get_functional_group.py		get_functional_group.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChemFG-Tool: The functional group identification toolkits for ChemDFM-R

Repository Structure

`functional_group_list.tsv`

`get_functional_group.py`

Usage Examples

Example 1: Molecular Functional Group Identification

Example 2: Reaction Functional Group Identification

Citation

Contact

About

Uh oh!

Releases

Packages

Languages

License

OpenDFM/ChemFG-Tool

Folders and files

Latest commit

History

Repository files navigation

ChemFG-Tool: The functional group identification toolkits for ChemDFM-R

Repository Structure

functional_group_list.tsv

get_functional_group.py

Usage Examples

Example 1: Molecular Functional Group Identification

Example 2: Reaction Functional Group Identification

Citation

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`functional_group_list.tsv`

`get_functional_group.py`

Packages