Contextual Molecule Representation Learning from Chemical Reaction
Knowledge
- URL: http://arxiv.org/abs/2402.13779v1
- Date: Wed, 21 Feb 2024 12:58:40 GMT
- Title: Contextual Molecule Representation Learning from Chemical Reaction
Knowledge
- Authors: Han Tang, Shikun Feng, Bicheng Lin, Yuyan Ni, JIngjing Liu, Wei-Ying
Ma, Yanyan Lan
- Abstract summary: We introduce REMO, a self-supervised learning framework that takes advantage of well-defined atom-combination rules in common chemistry.
REMO pre-trains graph/Transformer encoders on 1.7 million known chemical reactions in the literature.
- Score: 24.501564702095937
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, self-supervised learning has emerged as a powerful tool to
harness abundant unlabelled data for representation learning and has been
broadly adopted in diverse areas. However, when applied to molecular
representation learning (MRL), prevailing techniques such as masked sub-unit
reconstruction often fall short, due to the high degree of freedom in the
possible combinations of atoms within molecules, which brings insurmountable
complexity to the masking-reconstruction paradigm. To tackle this challenge, we
introduce REMO, a self-supervised learning framework that takes advantage of
well-defined atom-combination rules in common chemistry. Specifically, REMO
pre-trains graph/Transformer encoders on 1.7 million known chemical reactions
in the literature. We propose two pre-training objectives: Masked Reaction
Centre Reconstruction (MRCR) and Reaction Centre Identification (RCI). REMO
offers a novel solution to MRL by exploiting the underlying shared patterns in
chemical reactions as \textit{context} for pre-training, which effectively
infers meaningful representations of common chemistry knowledge. Such
contextual representations can then be utilized to support diverse downstream
molecular tasks with minimum finetuning, such as affinity prediction and
drug-drug interaction prediction. Extensive experimental results on
MoleculeACE, ACNet, drug-drug interaction (DDI), and reaction type
classification show that across all tested downstream tasks, REMO outperforms
the standard baseline of single-molecule masked modeling used in current MRL.
Remarkably, REMO is the pioneering deep learning model surpassing
fingerprint-based methods in activity cliff benchmarks.
Related papers
- Learning Chemical Reaction Representation with Reactant-Product Alignment [50.28123475356234]
This paper introduces modelname, a novel chemical reaction representation learning model tailored for a variety of organic-reaction-related tasks.
By integrating atomic correspondence between reactants and products, our model discerns the molecular transformations that occur during the reaction, thereby enhancing the comprehension of the reaction mechanism.
We have designed an adapter structure to incorporate reaction conditions into the chemical reaction representation, allowing the model to handle diverse reaction conditions and adapt to various datasets and downstream tasks, e.g., reaction performance prediction.
arXiv Detail & Related papers (2024-11-26T17:41:44Z) - Text-Augmented Multimodal LLMs for Chemical Reaction Condition Recommendation [50.639325453203504]
MM-RCR is a text-augmented multimodal LLM that learns a unified reaction representation from SMILES, reaction graphs, and textual corpus for chemical reaction recommendation (RCR)
Our results demonstrate that MM-RCR achieves state-of-the-art performance on two open benchmark datasets.
arXiv Detail & Related papers (2024-07-21T12:27:26Z) - ReactAIvate: A Deep Learning Approach to Predicting Reaction Mechanisms and Unmasking Reactivity Hotspots [4.362338454684645]
We develop an interpretable attention-based GNN that achieved near-unity and 96% accuracy for reaction step classification.
Our model adeptly identifies key atom(s) even from out-of-distribution classes.
This generalizabilty allows for the inclusion of new reaction types in a modular fashion, thus will be of value to experts for understanding the reactivity of new molecules.
arXiv Detail & Related papers (2024-07-14T05:53:18Z) - Substrate Scope Contrastive Learning: Repurposing Human Bias to Learn
Atomic Representations [14.528429119352328]
We introduce a novel pre-training strategy, substrate scope contrastive learning, which learns atomic representations tailored to chemical reactivity.
We focus on 20,798 aryl halides in the CAS Content Collection spanning thousands of publications to learn a representation of aryl halide reactivity.
This work not only presents a chemistry-tailored neural network pre-training strategy to learn reactivity-aligned atomic representations, but also marks a first-of-its-kind approach to benefit from the human bias in substrate scope design.
arXiv Detail & Related papers (2024-02-19T02:21:20Z) - MolCAP: Molecular Chemical reActivity pretraining and
prompted-finetuning enhanced molecular representation learning [3.179128580341411]
MolCAP is a graph pretraining Transformer based on chemical reactivity (IMR) knowledge with prompted finetuning.
Prompted by MolCAP, even basic graph neural networks are capable of achieving surprising performance that outperforms previous models.
arXiv Detail & Related papers (2023-06-13T13:48:06Z) - From Cloze to Comprehension: Retrofitting Pre-trained Masked Language
Model to Pre-trained Machine Reader [130.45769668885487]
Pre-trained Machine Reader (PMR) is a novel method for retrofitting masked language models (MLMs) to pre-trained machine reading comprehension (MRC) models without acquiring labeled data.
To build the proposed PMR, we constructed a large volume of general-purpose and high-quality MRC-style training data.
PMR has the potential to serve as a unified model for tackling various extraction and classification tasks in the MRC formulation.
arXiv Detail & Related papers (2022-12-09T10:21:56Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Improving Molecular Contrastive Learning via Faulty Negative Mitigation
and Decomposed Fragment Contrast [17.142976840521264]
We propose iMolCLR: improvement of Molecular Contrastive Learning of Representations with graph neural networks (GNNs)
Experiments have shown that the proposed strategies significantly improve the performance of GNN models.
iMolCLR intrinsically embeds scaffolds and functional groups that can reason molecule similarities.
arXiv Detail & Related papers (2022-02-18T18:33:27Z) - Improving Molecular Representation Learning with Metric
Learning-enhanced Optimal Transport [49.237577649802034]
We develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems.
MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances.
arXiv Detail & Related papers (2022-02-13T04:56:18Z) - Chemical-Reaction-Aware Molecule Representation Learning [88.79052749877334]
We propose using chemical reactions to assist learning molecule representation.
Our approach is proven effective to 1) keep the embedding space well-organized and 2) improve the generalization ability of molecule embeddings.
Experimental results demonstrate that our method achieves state-of-the-art performance in a variety of downstream tasks.
arXiv Detail & Related papers (2021-09-21T00:08:43Z) - Modern Hopfield Networks for Few- and Zero-Shot Reaction Prediction [3.885603826656419]
Computer-assisted synthesis planning (CASP) to realize physical molecules is still in its infancy and lacks a performance level that would enable large-scale molecule discovery.
We propose a novel reaction prediction approach that uses a deep learning architecture with modern Hopfield networks (MHNs) that is optimized by contrastive learning.
We show that our MHN contrastive learning approach enables few- and zero-shot learning for reaction prediction which, in contrast to previous methods, can deal with rare, single, or even no training example(s) for a reaction.
arXiv Detail & Related papers (2021-04-07T17:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.