Related papers: Contextual Molecule Representation Learning from Chemical Reaction Knowledge

Contextual Molecule Representation Learning from Chemical Reaction Knowledge

URL: http://arxiv.org/abs/2402.13779v1
Date: Wed, 21 Feb 2024 12:58:40 GMT
Title: Contextual Molecule Representation Learning from Chemical Reaction Knowledge
Authors: Han Tang, Shikun Feng, Bicheng Lin, Yuyan Ni, JIngjing Liu, Wei-Ying Ma, Yanyan Lan
Abstract summary: We introduce REMO, a self-supervised learning framework that takes advantage of well-defined atom-combination rules in common chemistry. REMO pre-trains graph/Transformer encoders on 1.7 million known chemical reactions in the literature.
Score: 24.501564702095937
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years, self-supervised learning has emerged as a powerful tool to harness abundant unlabelled data for representation learning and has been broadly adopted in diverse areas. However, when applied to molecular representation learning (MRL), prevailing techniques such as masked sub-unit reconstruction often fall short, due to the high degree of freedom in the possible combinations of atoms within molecules, which brings insurmountable complexity to the masking-reconstruction paradigm. To tackle this challenge, we introduce REMO, a self-supervised learning framework that takes advantage of well-defined atom-combination rules in common chemistry. Specifically, REMO pre-trains graph/Transformer encoders on 1.7 million known chemical reactions in the literature. We propose two pre-training objectives: Masked Reaction Centre Reconstruction (MRCR) and Reaction Centre Identification (RCI). REMO offers a novel solution to MRL by exploiting the underlying shared patterns in chemical reactions as \textit{context} for pre-training, which effectively infers meaningful representations of common chemistry knowledge. Such contextual representations can then be utilized to support diverse downstream molecular tasks with minimum finetuning, such as affinity prediction and drug-drug interaction prediction. Extensive experimental results on MoleculeACE, ACNet, drug-drug interaction (DDI), and reaction type classification show that across all tested downstream tasks, REMO outperforms the standard baseline of single-molecule masked modeling used in current MRL. Remarkably, REMO is the pioneering deep learning model surpassing fingerprint-based methods in activity cliff benchmarks.

Related papers

Uni-Mol3: A Multi-Molecular Foundation Model for Advancing Organic Reaction Modeling [36.36866930946212]
This paper introduces Uni-Mol3, a novel deep learning framework that employs a hierarchical pipeline for multi-molecular reaction modeling.<n>At its core, Uni-Mol3 adopts a multi-scale molecular tokenizer (Mol-Tokenizer) that encodes 3D structures of molecules and other features into discrete tokens.<n>With prompt-aware downstream fine-tuning, Uni-Mol3 demonstrates exceptional performance in diverse organic reaction tasks.
arXiv Detail & Related papers (2025-07-30T02:38:52Z)
ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data [53.78763789036172]
We present ChemActor, a fully fine-tuned large language model (LLM) as a chemical executor to convert between unstructured experimental procedures and structured action sequences.<n>This framework integrates a data selection module that selects data based on distribution divergence, with a general-purpose LLM, to generate machine-executable actions from a single molecule input.<n>Experiments on reaction-to-description (R2D) and description-to-action (D2A) tasks demonstrate that ChemActor achieves state-of-the-art performance, outperforming the baseline model by 10%.
arXiv Detail & Related papers (2025-06-30T05:11:19Z)
Interpretable Deep Learning for Polar Mechanistic Reaction Prediction [43.95903801494905]
We introduce PMechRP (Polar Mechanistic Reaction Predictor), a system that trains machine learning models on the PMechDB dataset. We train compare a range of machine learning models, including transformer-based, graph-based and two-step siamese architectures. Our best-performing approach was a hybrid model, which combines a 5-ensemble of Chemformer models with a two-step Siamese framework.
arXiv Detail & Related papers (2025-04-22T02:31:23Z)
Chemical knowledge-informed framework for privacy-aware retrosynthesis learning [60.93245342663455]
Current machine learning-based retrosynthesis gathers reaction data from multiple sources into one single edge to train prediction models. This paradigm poses considerable privacy risks as it necessitates broad data availability across organizational boundaries. In the present study, we introduce the chemical knowledge-informed framework (CKIF), a privacy-preserving approach for learning retrosynthesis models.
arXiv Detail & Related papers (2025-02-26T13:13:24Z)
Learning Chemical Reaction Representation with Reactant-Product Alignment [50.28123475356234]
This paper introduces modelname, a novel chemical reaction representation learning model tailored for a variety of organic-reaction-related tasks. By integrating atomic correspondence between reactants and products, our model discerns the molecular transformations that occur during the reaction, thereby enhancing the comprehension of the reaction mechanism. We have designed an adapter structure to incorporate reaction conditions into the chemical reaction representation, allowing the model to handle diverse reaction conditions and adapt to various datasets and downstream tasks, e.g., reaction performance prediction.
arXiv Detail & Related papers (2024-11-26T17:41:44Z)
Text-Augmented Multimodal LLMs for Chemical Reaction Condition Recommendation [50.639325453203504]
MM-RCR is a text-augmented multimodal LLM that learns a unified reaction representation from SMILES, reaction graphs, and textual corpus for chemical reaction recommendation (RCR) Our results demonstrate that MM-RCR achieves state-of-the-art performance on two open benchmark datasets.
arXiv Detail & Related papers (2024-07-21T12:27:26Z)
ReactAIvate: A Deep Learning Approach to Predicting Reaction Mechanisms and Unmasking Reactivity Hotspots [4.362338454684645]
We develop an interpretable attention-based GNN that achieved near-unity and 96% accuracy for reaction step classification. Our model adeptly identifies key atom(s) even from out-of-distribution classes. This generalizabilty allows for the inclusion of new reaction types in a modular fashion, thus will be of value to experts for understanding the reactivity of new molecules.
arXiv Detail & Related papers (2024-07-14T05:53:18Z)
Substrate Scope Contrastive Learning: Repurposing Human Bias to Learn Atomic Representations [14.528429119352328]
We introduce a novel pre-training strategy, substrate scope contrastive learning, which learns atomic representations tailored to chemical reactivity. We focus on 20,798 aryl halides in the CAS Content Collection spanning thousands of publications to learn a representation of aryl halide reactivity. This work not only presents a chemistry-tailored neural network pre-training strategy to learn reactivity-aligned atomic representations, but also marks a first-of-its-kind approach to benefit from the human bias in substrate scope design.
arXiv Detail & Related papers (2024-02-19T02:21:20Z)
MolCAP: Molecular Chemical reActivity pretraining and prompted-finetuning enhanced molecular representation learning [3.179128580341411]
MolCAP is a graph pretraining Transformer based on chemical reactivity (IMR) knowledge with prompted finetuning. Prompted by MolCAP, even basic graph neural networks are capable of achieving surprising performance that outperforms previous models.
arXiv Detail & Related papers (2023-06-13T13:48:06Z)
From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Model to Pre-trained Machine Reader [130.45769668885487]
Pre-trained Machine Reader (PMR) is a novel method for retrofitting masked language models (MLMs) to pre-trained machine reading comprehension (MRC) models without acquiring labeled data. To build the proposed PMR, we constructed a large volume of general-purpose and high-quality MRC-style training data. PMR has the potential to serve as a unified model for tackling various extraction and classification tasks in the MRC formulation.
arXiv Detail & Related papers (2022-12-09T10:21:56Z)
Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation. We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria. Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z)
Improving Molecular Contrastive Learning via Faulty Negative Mitigation and Decomposed Fragment Contrast [17.142976840521264]
We propose iMolCLR: improvement of Molecular Contrastive Learning of Representations with graph neural networks (GNNs) Experiments have shown that the proposed strategies significantly improve the performance of GNN models. iMolCLR intrinsically embeds scaffolds and functional groups that can reason molecule similarities.
arXiv Detail & Related papers (2022-02-18T18:33:27Z)
Improving Molecular Representation Learning with Metric Learning-enhanced Optimal Transport [49.237577649802034]
We develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems. MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances.
arXiv Detail & Related papers (2022-02-13T04:56:18Z)
Chemical-Reaction-Aware Molecule Representation Learning [88.79052749877334]
We propose using chemical reactions to assist learning molecule representation. Our approach is proven effective to 1) keep the embedding space well-organized and 2) improve the generalization ability of molecule embeddings. Experimental results demonstrate that our method achieves state-of-the-art performance in a variety of downstream tasks.
arXiv Detail & Related papers (2021-09-21T00:08:43Z)
Modern Hopfield Networks for Few- and Zero-Shot Reaction Prediction [3.885603826656419]
Computer-assisted synthesis planning (CASP) to realize physical molecules is still in its infancy and lacks a performance level that would enable large-scale molecule discovery. We propose a novel reaction prediction approach that uses a deep learning architecture with modern Hopfield networks (MHNs) that is optimized by contrastive learning. We show that our MHN contrastive learning approach enables few- and zero-shot learning for reaction prediction which, in contrast to previous methods, can deal with rare, single, or even no training example(s) for a reaction.
arXiv Detail & Related papers (2021-04-07T17:35:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.