Transformer-Based Approach for Automated Functional Group Replacement in Chemical Compounds
- URL: http://arxiv.org/abs/2601.07930v1
- Date: Mon, 12 Jan 2026 19:01:11 GMT
- Title: Transformer-Based Approach for Automated Functional Group Replacement in Chemical Compounds
- Authors: Bo Pan, Zhiping Zhang, Kevin Spiekermann, Tianchi Chen, Xiang Yu, Liying Zhang, Liang Zhao,
- Abstract summary: We develop a novel two-stage transformer model for functional group removal and replacement.<n>Unlike one-shot approaches that generate entire molecules in a single pass, our method generates the functional group to be removed and appended sequentially.
- Score: 12.414301421345227
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Functional group replacement is a pivotal approach in cheminformatics to enable the design of novel chemical compounds with tailored properties. Traditional methods for functional group removal and replacement often rely on rule-based heuristics, which can be limited in their ability to generate diverse and novel chemical structures. Recently, transformer-based models have shown promise in improving the accuracy and efficiency of molecular transformations, but existing approaches typically focus on single-step modeling, lacking the guarantee of structural similarity. In this work, we seek to advance the state of the art by developing a novel two-stage transformer model for functional group removal and replacement. Unlike one-shot approaches that generate entire molecules in a single pass, our method generates the functional group to be removed and appended sequentially, ensuring strict substructure-level modifications. Using a matched molecular pairs (MMPs) dataset derived from ChEMBL, we trained an encoder-decoder transformer model with SMIRKS-based representations to capture transformation rules effectively. Extensive evaluations demonstrate our method's ability to generate chemically valid transformations, explore diverse chemical spaces, and maintain scalability across varying search sizes.
Related papers
- Retrieval-Augmented Foundation Models for Matched Molecular Pair Transformations to Recapitulate Medicinal Chemistry Intuition [11.475465740098683]
We propose a variable-to-variable formulation of analog generation and train a foundation model on large-scale MMP transformations.<n>We develop prompting mechanisms that let the users specify preferred transformation patterns during generation.<n>Experiments on general chemical corpora and patent-specific datasets demonstrate improved diversity, novelty, and controllability.
arXiv Detail & Related papers (2026-02-18T18:27:21Z) - Circuits, Features, and Heuristics in Molecular Transformers [0.056179939237156]
We present a mechanistic analysis of autoregressive transformers trained on drug-like small molecules.<n>We identify computational patterns consistent with low-level syntactic parsing and more abstract chemical validity constraints.
arXiv Detail & Related papers (2025-12-10T15:35:22Z) - MoRE: Batch-Robust Multi-Omics Representations from Frozen Pre-trained Transformers [0.0]
We present MoRE (Multi-Omics Representation Embedding), a framework that repurposes frozen pre-trained transformers to align heterogeneous assays into a shared latent space.<n>Specifically, MoRE attaches lightweight, modality-specific adapters and a task-adaptive fusion layer to the frozen backbone.<n>We benchmark MoRE against established baselines, including scGPT, scVI, and Harmony with Scrublet, evaluating integration fidelity, rare population detection, and modality transfer.
arXiv Detail & Related papers (2025-11-25T15:04:06Z) - GraphXForm: Graph transformer for computer-aided molecular design [73.1842164721868]
We present GraphXForm, a decoder-only graph transformer architecture, which is pretrained on existing compounds.<n>We evaluate it on various drug design tasks, demonstrating superior objective scores compared to state-of-the-art molecular design approaches.
arXiv Detail & Related papers (2024-11-03T19:45:15Z) - Structure Language Models for Protein Conformation Generation [66.42864253026053]
Traditional physics-based simulation methods often struggle with sampling equilibrium conformations.<n>Deep generative models have shown promise in generating protein conformations as a more efficient alternative.<n>We introduce Structure Language Modeling as a novel framework for efficient protein conformation generation.
arXiv Detail & Related papers (2024-10-24T03:38:51Z) - Molecule Design by Latent Prompt Transformer [76.2112075557233]
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task.
We propose a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
arXiv Detail & Related papers (2024-02-27T03:33:23Z) - Molecular De Novo Design through Transformer-based Reinforcement
Learning [38.803770968809225]
We introduce a method to fine-tune a Transformer-based generative model for molecular de novo design.
Our proposed method exhibits superior performance in generating compounds predicted to be active against various biological targets.
Our approach can be used for scaffold hopping, library expansion starting from a single molecule, and generating compounds with high predicted activity against biological targets.
arXiv Detail & Related papers (2023-10-09T02:51:01Z) - Learning Modulated Transformation in GANs [69.95217723100413]
We equip the generator in generative adversarial networks (GANs) with a plug-and-play module, termed as modulated transformation module (MTM)
MTM predicts spatial offsets under the control of latent codes, based on which the convolution operation can be applied at variable locations.
It is noteworthy that towards human generation on the challenging TaiChi dataset, we improve the FID of StyleGAN3 from 21.36 to 13.60, demonstrating the efficacy of learning modulated geometry transformation.
arXiv Detail & Related papers (2023-08-29T17:51:22Z) - Molecular Attributes Transfer from Non-Parallel Data [57.010952598634944]
We formulate molecular optimization as a style transfer problem and present a novel generative model that could automatically learn internal differences between two groups of non-parallel data.
Experiments on two molecular optimization tasks, toxicity modification and synthesizability improvement, demonstrate that our model significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2021-11-30T06:10:22Z) - Geometric Transformer for End-to-End Molecule Properties Prediction [92.28929858529679]
We introduce a Transformer-based architecture for molecule property prediction, which is able to capture the geometry of the molecule.
We modify the classical positional encoder by an initial encoding of the molecule geometry, as well as a learned gated self-attention mechanism.
arXiv Detail & Related papers (2021-10-26T14:14:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.