FragmentFlow: Scalable Transition State Generation for Large Molecules
- URL: http://arxiv.org/abs/2602.02310v2
- Date: Wed, 11 Feb 2026 17:18:34 GMT
- Title: FragmentFlow: Scalable Transition State Generation for Large Molecules
- Authors: Ron Shprints, Peter Holderrieth, Juno Nam, Rafael Gómez-Bombarelli, Tommi Jaakkola,
- Abstract summary: Transition states (TSs) are central to understanding and quantitatively predicting chemical reactivity and reaction mechanisms.<n>Recent generative modeling approaches have enabled chemically meaningful TS prediction for relatively small molecules.<n>We introduce FragmentFlow: a divide-and-conquer approach that trains a generative model to predict TS geometries for the reactive core atoms.
- Score: 7.730033348957774
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transition states (TSs) are central to understanding and quantitatively predicting chemical reactivity and reaction mechanisms. Although traditional TS generation methods are computationally expensive, recent generative modeling approaches have enabled chemically meaningful TS prediction for relatively small molecules. However, these methods fail to generalize to practically relevant reaction substrates because of distribution shifts induced by increasing molecular sizes. Furthermore, TS geometries for larger molecules are not available at scale, making it infeasible to train generative models from scratch on such molecules. To address these challenges, we introduce FragmentFlow: a divide-and-conquer approach that trains a generative model to predict TS geometries for the reactive core atoms, which define the reaction mechanism. The full TS structure is then reconstructed by re-attaching substituent fragments to the predicted core. By operating on reactive cores, whose size and composition remain relatively invariant across molecular contexts, FragmentFlow mitigates distribution shifts in generative modeling. Evaluated on a new curated dataset of reactions involving reactants with up to 33 heavy atoms, FragmentFlow correctly identifies 90% of TSs while requiring 30% fewer saddle-point optimization steps than classical initialization schemes. These results point toward scalable TS generation for high-throughput reactivity studies.
Related papers
- Synthesizable Molecular Generation via Soft-constrained GFlowNets with Rich Chemical Priors [42.095574458478616]
We propose S3-GFN, which generates synthesizable SMILES molecules via simple soft regularization of a sequence-based GFlowNet.<n>Our experiments show that S3-GFN learns to generate synthesizable molecules with higher rewards in diverse tasks.
arXiv Detail & Related papers (2026-02-04T01:27:42Z) - Generating transition states of chemical reactions via distance-geometry-based flow matching [45.432950944168205]
We propose TS-DFM, a flow matching framework that predicts transition states from reactants and products.<n>On the benchmark dataset Transition1X, TS-DFM outperforms the previous state-of-the-art method React-OT by 30% in structural accuracy.
arXiv Detail & Related papers (2025-11-21T13:15:25Z) - Flow matching for reaction pathway generation [1.8420084274819617]
MolGEN is a conditional flow-matching framework that learns an optimal transport path to transport Gaussian priors to target chemical distributions.<n>On benchmarks used by TSDiff and OA-ReactDiff, MolGEN surpasses TS geometry accuracy and barrier-height prediction while reducing sampling to sub-second.<n>MolGEN also supports open-ended product generation with competitive top-k accuracy and avoids mass/electron-balance violations common to sequence models.
arXiv Detail & Related papers (2025-07-14T17:54:47Z) - Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - RGFN: Synthesizable Molecular Generation Using GFlowNets [51.33672611338754]
We propose Reaction-GFlowNet, an extension of the GFlowNet framework that operates directly in the space of chemical reactions.
RGFN allows out-of-the-box synthesizability while maintaining comparable quality of generated candidates.
We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.
arXiv Detail & Related papers (2024-06-01T13:11:11Z) - React-OT: Optimal Transport for Generating Transition State in Chemical Reactions [45.99250641377074]
We develop React-OT, an optimal transport approach for generating unique Transition State structures from reactants and products.
Re React-OT generates highly accurate TS structures with a median structural root mean square deviation (RMSD) of 0.053AA and median barrier height error of 1.06 kcal/mol requiring only 0.4 second per reaction.
arXiv Detail & Related papers (2024-04-20T17:31:45Z) - Accurate transition state generation with an object-aware equivariant
elementary reaction diffusion model [9.878043289026731]
Transition state (TS) search is key in chemistry for elucidating reaction mechanisms and exploring reaction networks.
Here, we develop an object-aware SE(3) equivariant diffusion model that satisfies all physical symmetries and constraints for generating sets of structures in an elementary reaction.
provided reactant and product, this model generates a TS structure in seconds instead of hours required when performing quantum chemistry-based optimizations.
arXiv Detail & Related papers (2023-04-12T22:21:36Z) - MARS: A Motif-based Autoregressive Model for Retrosynthesis Prediction [54.75583184356392]
We propose a novel end-to-end graph generation model for retrosynthesis prediction.
It sequentially identifies the reaction center, generates the synthons, and adds motifs to the synthons to generate reactants.
Experiments on a benchmark dataset show that the proposed model significantly outperforms previous state-of-the-art algorithms.
arXiv Detail & Related papers (2022-09-27T06:29:35Z) - Learning Graph Models for Retrosynthesis Prediction [90.15523831087269]
Retrosynthesis prediction is a fundamental problem in organic synthesis.
This paper introduces a graph-based approach that capitalizes on the idea that the graph topology of precursor molecules is largely unaltered during a chemical reaction.
Our model achieves a top-1 accuracy of $53.7%$, outperforming previous template-free and semi-template-based methods.
arXiv Detail & Related papers (2020-06-12T09:40:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.