FragmentGPT: A Unified GPT Model for Fragment Growing, Linking, and Merging in Molecular Design
- URL: http://arxiv.org/abs/2509.11044v2
- Date: Tue, 23 Sep 2025 16:41:27 GMT
- Title: FragmentGPT: A Unified GPT Model for Fragment Growing, Linking, and Merging in Molecular Design
- Authors: Xuefeng Liu, Songhao Jiang, Qinan Huang, Tinson Xu, Ian Foster, Mengdi Wang, Hening Lin, Rick Stevens,
- Abstract summary: FragmentGPT generates linkers to combine disconnected molecular fragments into chemically and pharmacologically viable candidates.<n>It also learns to resolve structural redundancies-such as duplicated fragments-through intelligent merging.<n> experiments and ablation studies on real-world cancer datasets demonstrate its ability to generate chemically valid, high-quality molecules.
- Score: 42.429674313921545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fragment-Based Drug Discovery (FBDD) is a popular approach in early drug development, but designing effective linkers to combine disconnected molecular fragments into chemically and pharmacologically viable candidates remains challenging. Further complexity arises when fragments contain structural redundancies, like duplicate rings, which cannot be addressed by simply adding or removing atoms or bonds. To address these challenges in a unified framework, we introduce FragmentGPT, which integrates two core components: (1) a novel chemically-aware, energy-based bond cleavage pre-training strategy that equips the GPT-based model with fragment growing, linking, and merging capabilities, and (2) a novel Reward Ranked Alignment with Expert Exploration (RAE) algorithm that combines expert imitation learning for diversity enhancement, data selection and augmentation for Pareto and composite score optimality, and Supervised Fine-Tuning (SFT) to align the learner policy with multi-objective goals. Conditioned on fragment pairs, FragmentGPT generates linkers that connect diverse molecular subunits while simultaneously optimizing for multiple pharmaceutical goals. It also learns to resolve structural redundancies-such as duplicated fragments-through intelligent merging, enabling the synthesis of optimized molecules. FragmentGPT facilitates controlled, goal-driven molecular assembly. Experiments and ablation studies on real-world cancer datasets demonstrate its ability to generate chemically valid, high-quality molecules tailored for downstream drug discovery tasks.
Related papers
- DrugR: Optimizing Molecular Drugs through LLM-based Explicit Reasoning [24.70952870676648]
DrugR is a large language model that introduces explicit, step-by-step pharmacological reasoning into the optimization process.<n>Our approach integrates domain-specific continual pretraining, supervised fine-tuning via reverse data engineering, and self-balanced multi-granular reinforcement learning.<n> Experimental results demonstrate that DrugR achieves comprehensive enhancement across multiple properties without compromising structural similarity or target binding affinity.
arXiv Detail & Related papers (2026-02-09T02:26:25Z) - ReACT-Drug: Reaction-Template Guided Reinforcement Learning for de novo Drug Design [0.34155322317700576]
We introduce bfReACT-Drug, a fully integrated, target-agnostic molecular design framework based on Reinforcement Learning.<n>This architecture highlights the potential of integrating structural biology, deep representation learning, and chemical rules to automate and accelerate rational drug design.
arXiv Detail & Related papers (2025-12-24T05:29:35Z) - MolChord: Structure-Sequence Alignment for Protein-Guided Drug Design [25.550555350063366]
MolChord aims to align protein and molecule structures with their textual descriptions and sequential representations.<n>We leverage autoregressive model unifying text, small molecules, and proteins, as the molecule generator, alongside a diffusion-based structure encoder.<n>We curate a property-aware dataset by integrating preference data and refine the alignment process using Direct Preference Optimization.
arXiv Detail & Related papers (2025-10-31T17:35:53Z) - GenMol: A Drug Discovery Generalist with Discrete Diffusion [43.29814519270451]
Generalist Molecular generative model (GenMol) is a versatile framework that uses only a single discrete diffusion model to handle diverse drug discovery scenarios.<n>GenMol generates Sequential Attachment-based Fragment Embedding sequences through non-autoregressive bidirectional parallel decoding.<n>GenMol significantly outperforms the previous GPT-based model in de novo generation and fragment-constrained generation.
arXiv Detail & Related papers (2025-01-10T18:30:05Z) - RFL: Simplifying Chemical Structure Recognition with Ring-Free Language [66.47173094346115]
We propose a novel Ring-Free Language (RFL) to describe chemical structures in a hierarchical form.<n>RFL allows complex molecular structures to be decomposed into multiple parts, ensuring both uniqueness and conciseness.<n>We propose a universal Molecular Skeleton Decoder (MSD), which comprises a skeleton generation module that progressively predicts the molecular skeleton and individual rings.
arXiv Detail & Related papers (2024-12-10T15:29:32Z) - Molecule Generation with Fragment Retrieval Augmentation [41.95947899013865]
Fragment Retrieval-Augmented Generation (f-RAG) is based on a pre-trained molecular generative model that proposes additional fragments to complete and generate a new molecule.
To extrapolate beyond the existing fragments, f-RAG updates the fragment vocabulary with generated fragments via an iterative refinement process.
arXiv Detail & Related papers (2024-11-18T21:43:52Z) - Drug Discovery with Dynamic Goal-aware Fragments [76.10700304803177]
We propose a molecular generative framework for drug discovery, named Goal-aware fragment Extraction, Assembly, and Modification (GEAM)
GEAM consists of three modules, each responsible for goal-aware fragment extraction, fragment assembly, and fragment modification.
We experimentally demonstrate that GEAM effectively discovers drug candidates through the generative cycle of the three modules.
arXiv Detail & Related papers (2023-10-02T01:30:42Z) - Functional-Group-Based Diffusion for Pocket-Specific Molecule Generation and Elaboration [63.23362798102195]
We propose D3FG, a functional-group-based diffusion model for pocket-specific molecule generation and elaboration.
D3FG decomposes molecules into two categories of components: functional groups defined as rigid bodies and linkers as mass points.
In the experiments, our method can generate molecules with more realistic 3D structures, competitive affinities toward the protein targets, and better drug properties.
arXiv Detail & Related papers (2023-05-30T06:41:20Z) - Equivariant 3D-Conditional Diffusion Models for Molecular Linker Design [82.23006955069229]
We propose DiffLinker, an E(3)-equivariant 3D-conditional diffusion model for molecular linker design.
Our model places missing atoms in between and designs a molecule incorporating all the initial fragments.
We demonstrate that DiffLinker outperforms other methods on the standard datasets generating more diverse and synthetically-accessible molecules.
arXiv Detail & Related papers (2022-10-11T09:13:37Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.