A Pre-trained Reaction Embedding Descriptor Capturing Bond Transformation Patterns
- URL: http://arxiv.org/abs/2601.03689v1
- Date: Wed, 07 Jan 2026 08:24:08 GMT
- Title: A Pre-trained Reaction Embedding Descriptor Capturing Bond Transformation Patterns
- Authors: Weiqi Liu, Fenglei Cao, Yuan Qi, Li-Cheng Xu,
- Abstract summary: This study introduces RXNEmb, a novel reaction-level descriptor derived from RXNGraphormer.<n>We demonstrate its utility by data-driven re-clustering of the USPTO-50k dataset.<n> RXNEmb serves as a powerful, interpretable tool for reaction fingerprinting and analysis.
- Score: 4.8838428804671326
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rise of data-driven reaction prediction models, effective reaction descriptors are crucial for bridging the gap between real-world chemistry and digital representations. However, general-purpose, reaction-wise descriptors remain scarce. This study introduces RXNEmb, a novel reaction-level descriptor derived from RXNGraphormer, a model pre-trained to distinguish real reactions from fictitious ones with erroneous bond changes, thereby learning intrinsic bond formation and cleavage patterns. We demonstrate its utility by data-driven re-clustering of the USPTO-50k dataset, yielding a classification that more directly reflects bond-change similarities than rule-based categories. Combined with dimensionality reduction, RXNEmb enables visualization of reaction space diversity. Furthermore, attention weight analysis reveals the model's focus on chemically critical sites, providing mechanistic insight. RXNEmb serves as a powerful, interpretable tool for reaction fingerprinting and analysis, paving the way for more data-centric approaches in reaction analysis and discovery.
Related papers
- Reaction Prediction via Interaction Modeling of Symmetric Difference Shingle Sets [4.597922051722059]
We propose ReaDISH, a novel machine learning model for chemical reaction prediction.<n>It learns permutation-invariant representations while incorporating interaction-aware features.<n>It shows enhanced robustness with an average improvement of 8.76% on R$2$ under permutations.
arXiv Detail & Related papers (2025-11-09T12:29:16Z) - RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning [51.393018266721576]
We propose the RxnCaption framework for the task of chemical Reaction Diagram Parsing (RxnDP)<n>Our framework reformulates the traditional coordinate prediction driven parsing process into an image captioning problem.<n>We introduce a strategy termed "BBox and Index as Visual Prompt" (BIVP), which uses our state-of-the-art molecular detector, MolYOLO, to pre-draw molecular bounding boxes and indices directly onto the input image.
arXiv Detail & Related papers (2025-11-04T09:08:44Z) - Interpretable Deep Learning for Polar Mechanistic Reaction Prediction [43.95903801494905]
We introduce PMechRP (Polar Mechanistic Reaction Predictor), a system that trains machine learning models on the PMechDB dataset.<n>We train compare a range of machine learning models, including transformer-based, graph-based and two-step siamese architectures.<n>Our best-performing approach was a hybrid model, which combines a 5-ensemble of Chemformer models with a two-step Siamese framework.
arXiv Detail & Related papers (2025-04-22T02:31:23Z) - Chemical knowledge-informed framework for privacy-aware retrosynthesis learning [72.39098405805318]
Current machine learning-based retrosynthesis gathers reaction data from multiple sources into one single edge to train prediction models.<n>This paradigm poses considerable privacy risks as it necessitates broad data availability across organizational boundaries.<n>In the present study, we introduce the chemical knowledge-informed framework (CKIF), a privacy-preserving approach for learning retrosynthesis models.
arXiv Detail & Related papers (2025-02-26T13:13:24Z) - Learning Chemical Reaction Representation with Reactant-Product Alignment [50.28123475356234]
RAlign is a novel chemical reaction representation learning model for various organic reaction-related tasks.<n>By integrating atomic correspondence between reactants and products, our model discerns the molecular transformations that occur during the reaction.<n>We introduce a reaction-center-aware attention mechanism that enables the model to concentrate on key functional groups.
arXiv Detail & Related papers (2024-11-26T17:41:44Z) - log-RRIM: Yield Prediction via Local-to-global Reaction Representation Learning and Interaction Modeling [6.310759215182946]
log-RRIM is an innovative graph transformer-based framework designed for predicting chemical reaction yields.<n>A key feature of log-RRIM is its integration of a cross-attention mechanism that focuses on the interplay between reagents and reaction centers.<n>Log-RRIM shows superior performance in our experiments, especially for medium to high-yielding reactions.
arXiv Detail & Related papers (2024-10-20T18:35:56Z) - ReactAIvate: A Deep Learning Approach to Predicting Reaction Mechanisms and Unmasking Reactivity Hotspots [4.362338454684645]
We develop an interpretable attention-based GNN that achieved near-unity and 96% accuracy for reaction step classification.
Our model adeptly identifies key atom(s) even from out-of-distribution classes.
This generalizabilty allows for the inclusion of new reaction types in a modular fashion, thus will be of value to experts for understanding the reactivity of new molecules.
arXiv Detail & Related papers (2024-07-14T05:53:18Z) - Beyond Major Product Prediction: Reproducing Reaction Mechanisms with
Machine Learning Models Trained on a Large-Scale Mechanistic Dataset [10.968137261042715]
Mechanistic understanding of organic reactions can facilitate reaction development, impurity prediction, and in principle, reaction discovery.
While several machine learning models have sought to address the task of predicting reaction products, their extension to predicting reaction mechanisms has been impeded by the lack of a corresponding mechanistic dataset.
We construct such a dataset by imputing intermediates between experimentally reported reactants and products using expert reaction templates and train several machine learning models on the resulting dataset of 5,184,184 elementary steps.
arXiv Detail & Related papers (2024-03-07T15:26:23Z) - Retrosynthesis prediction enhanced by in-silico reaction data
augmentation [66.5643280109899]
We present RetroWISE, a framework that employs a base model inferred from real paired data to perform in-silico reaction generation and augmentation.
On three benchmark datasets, RetroWISE achieves the best overall performance against state-of-the-art models.
arXiv Detail & Related papers (2024-01-31T07:40:37Z) - On the importance of catalyst-adsorbate 3D interactions for relaxed
energy predictions [98.70797778496366]
We investigate whether it is possible to predict a system's relaxed energy in the OC20 dataset while ignoring the relative position of the adsorbate.
We find that while removing binding site information impairs accuracy as expected, modified models are able to predict relaxed energies with remarkably decent MAE.
arXiv Detail & Related papers (2023-10-10T14:57:04Z) - AI-driven Hypergraph Network of Organic Chemistry: Network Statistics
and Applications in Reaction Classification [0.0]
We use a standard reactions dataset to construct a hypernetwork and report its statistics.
We also compute each statistic for an equivalent directed graph representation of reactions to draw parallels and highlight differences.
We conclude that the hypernetwork representation is flexible, preserves reaction context, and uncovers hidden insights.
arXiv Detail & Related papers (2022-08-02T14:12:03Z) - Energy-based View of Retrosynthesis [70.66156081030766]
We propose a framework that unifies sequence- and graph-based methods as energy-based models.
We present a novel dual variant within the framework that performs consistent training over Bayesian forward- and backward-prediction.
This model improves state-of-the-art performance by 9.6% for template-free approaches where the reaction type is unknown.
arXiv Detail & Related papers (2020-07-14T18:51:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.