ChemFG-Tool is a lightweight functional group identification toolkit developed and used during the development of ChemDFM-R (the paper of ChemDFM-R could be found here). It provides utilities for identifying functional groups in both molecules and chemical reactions, with an emphasis on broad coverage, reduced overlap, and better discrimination of complex composite functional groups.
This toolkit is built upon the functional group identification utilities provided by the Python package thermo. While the original thermo implementation supports 83 functional groups, its coverage is limited and some functional groups may be identified with overlapping or ambiguous matches, making it difficult to reliably distinguish complex composite functional groups.
Through careful analysis and redesign of functional group definitions and matching logic, ChemFG-Tool largely avoids these issues and achieves a more comprehensive, precise, and structure-aware functional group identification. Specifically, compared to thermo, ChemFG-Tool:
- Expands the supported functional group set from 83 to 241
- Reduces cross-over and redundant identifications and explicitly distinguishes composite functional groups from their simpler constituents
- Provides functional group localization, including atom-level positional information
A tab-separated file defining all supported functional groups, including:
- Class: Chemical category of the functional group
- Function Name: Python function used for identification
- Functional Group Name: Human-readable name
The core implementation of functional group identification.
The following examples illustrate typical usage.
from rdkit import Chem
import get_functional_group as fg
with open('./functional_group_list.tsv', 'r') as f:
rules = [row.strip().split('\t') for row in f][1:]
# Placeholder for molecule input (e.g., SMILES or RDKit Mol)
molecule = Chem.MolFromSmiles("<MOLECULE_PLACEHOLDER>")
# Output includes both functional group names and their positions
functional_groups = {}
for group, function_name, name in rules:
places = eval(f'fg.{function_name}')(molecule)
if places:
functional_groups[name] = places
print(functional_groups)Example output (conceptual):
{
"alkene/olefin": ((0, 1),),
"hydrazone": ((7, 8, 9),),
"thiolester": ((3, 4, 5),)
}
from rdkit import Chem
from rdkit.Chem import AllChem
import get_functional_group as fg
with open('./functional_group_list.tsv', 'r') as f:
rules = [row.strip().split('\t') for row in f][1:]
rules = [rule for rule in rules if rule[1] != 'is_alkane']
def get_fg(molecule, center):
functional_groups = []
Chem.SanitizeMol(molecule)
for group, function_name, name in rules:
places = eval(f'fg.{function_name}')(molecule)
if any([atom in place for atom in center for place in places]):
functional_groups.append(name)
return functional_groups
reaction = {
"reaction_smiles": "[CH2:1]1[O:35][CH2:34][CH2:33][CH2:36]1.[CH:2]([C:3]#[N:4])([CH3:5])[S:12](=[O:13])(=[O:14])[CH2:15][CH2:16][C:17]([F:18])([F:19])[F:20].[CH:6](=[CH:7][C:8]([F:9])([F:10])[F:11])[CH2:30][O:29][S:26]([c:25]1[cH:24][cH:23][c:22]([CH3:21])[cH:32][cH:31]1)(=[O:27])=[O:28]>>[CH3:1][C:2]([C:3]#[N:4])([CH2:5][CH:6]=[CH:7][C:8]([F:9])([F:10])[F:11])[S:12](=[O:13])(=[O:14])[CH2:15][CH2:16][C:17]([F:18])([F:19])[F:20]",
"reaction_center": [1, 2, 5, 6]
}
rxn = AllChem.ReactionFromSmarts(reaction["reaction_smiles"])
reactants = rxn.GetReactants()
products = rxn.GetProducts()
reactants_fg = [get_fg(r, reaction["reaction_center"]) for r in reactants]
products_fg = [get_fg(p, reaction["reaction_center"]) for p in products]
print({"reactants": reactants_fg, "products": products_fg})Example output:
{
"reactants": [
["ether"],
["nitrile", "sulfone"],
["alkene/olefin", "fluoro"]
],
"products": [
["alkene/olefin", "nitrile"]
]
}
If you use ChemFG-Tool in your research, please cite the ChemDFM-R paper:
@misc{zhao2025chemdfmrchemicalreasoningllm,
title={ChemDFM-R: A Chemical Reasoning LLM Enhanced with Atomized Chemical Knowledge},
author={Zihan Zhao and Bo Chen and Ziping Wan and Lu Chen and Xuanze Lin and Shiyang Yu and Situo Zhang and Da Ma and Zichen Zhu and Danyang Zhang and Huayang Wang and Zhongyang Dai and Liyang Wen and Xin Chen and Kai Yu},
year={2025},
eprint={2507.21990},
archivePrefix={arXiv},
primaryClass={cs.CE},
url={https://arxiv.org/abs/2507.21990},
}If you have any questions or further requests, please contact Zihan Zhao and Lu Chen.