Related papers: Chemical-Reaction-Aware Molecule Representation Learning

Chemical-Reaction-Aware Molecule Representation Learning

URL: http://arxiv.org/abs/2109.09888v2
Date: Wed, 22 Sep 2021 05:10:04 GMT
Title: Chemical-Reaction-Aware Molecule Representation Learning
Authors: Hongwei Wang, Weijiang Li, Xiaomeng Jin, Kyunghyun Cho, Heng Ji, Jiawei Han, Martin D. Burke
Abstract summary: We propose using chemical reactions to assist learning molecule representation. Our approach is proven effective to 1) keep the embedding space well-organized and 2) improve the generalization ability of molecule embeddings. Experimental results demonstrate that our method achieves state-of-the-art performance in a variety of downstream tasks.
Score: 88.79052749877334
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Molecule representation learning (MRL) methods aim to embed molecules into a real vector space. However, existing SMILES-based (Simplified Molecular-Input Line-Entry System) or GNN-based (Graph Neural Networks) MRL methods either take SMILES strings as input that have difficulty in encoding molecule structure information, or over-emphasize the importance of GNN architectures but neglect their generalization ability. Here we propose using chemical reactions to assist learning molecule representation. The key idea of our approach is to preserve the equivalence of molecules with respect to chemical reactions in the embedding space, i.e., forcing the sum of reactant embeddings and the sum of product embeddings to be equal for each chemical equation. This constraint is proven effective to 1) keep the embedding space well-organized and 2) improve the generalization ability of molecule embeddings. Moreover, our model can use any GNN as the molecule encoder and is thus agnostic to GNN architectures. Experimental results demonstrate that our method achieves state-of-the-art performance in a variety of downstream tasks, e.g., 17.4% absolute Hit@1 gain in chemical reaction prediction, 2.3% absolute AUC gain in molecule property prediction, and 18.5% relative RMSE gain in graph-edit-distance prediction, respectively, over the best baseline method. The code is available at https://github.com/hwwang55/MolR.

Related papers

FARM: Functional Group-Aware Representations for Small Molecules [55.281754551202326]
We introduce Functional Group-Aware Representations for Small Molecules (FARM)<n>FARM is a novel model designed to bridge the gap between SMILES, natural language, and molecular graphs.<n>We evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 11 out of 13 tasks.
arXiv Detail & Related papers (2024-10-02T23:04:58Z)
Using GNN property predictors as molecule generators [16.34646723046073]
Graph neural networks (GNNs) have emerged as powerful tools to accurately predict materials and molecular properties. In this article, we exploit the invertible nature of these neural networks to directly generate molecular structures with desired electronic properties.
arXiv Detail & Related papers (2024-06-05T13:53:47Z)
Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [55.42602325017405]
We propose a novel method called GODE, which takes into account the two-level structure of individual molecules. By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures. When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks.
arXiv Detail & Related papers (2023-06-02T15:49:45Z)
Atomic and Subgraph-aware Bilateral Aggregation for Molecular Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA) ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information. Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z)
Efficient Chemical Space Exploration Using Active Learning Based on Marginalized Graph Kernel: an Application for Predicting the Thermodynamic Properties of Alkanes with Molecular Simulation [10.339394156446982]
We use molecular dynamics simulation to generate data and graph neural network (GNN) to predict. In specific, targeting 251,728 alkane molecules consisting of 4 to 19 carbon atoms and their liquid physical properties. validation shows that only 313 molecules were sufficient to train an accurate GNN model with $rm R2 > 0.99$ for computational test sets and $rm R2 > 0.94$ for experimental test sets.
arXiv Detail & Related papers (2022-09-01T14:59:13Z)
Graph-based Molecular Representation Learning [59.06193431883431]
Molecular representation learning (MRL) is a key step to build the connection between machine learning and chemical science. Recently, MRL has achieved considerable progress, especially in methods based on deep molecular graph learning.
arXiv Detail & Related papers (2022-07-08T17:43:20Z)
ChemRL-GEM: Geometry Enhanced Molecular Representation Learning for Property Prediction [25.49976851499949]
We propose a novel Geometry Enhanced Molecular representation learning method (GEM) for Chemical Representation Learning (ChemRL) At first, we design a geometry-based GNN architecture that simultaneously models atoms, bonds, and bond angles in a molecule. On top of the devised GNN architecture, we propose several novel geometry-level self-supervised learning strategies to learn spatial knowledge.
arXiv Detail & Related papers (2021-06-11T02:35:53Z)
MolCLR: Molecular Contrastive Learning of Representations via Graph Neural Networks [11.994553575596228]
MolCLR is a self-supervised learning framework for large unlabeled molecule datasets. We propose three novel molecule graph augmentations: atom masking, bond deletion, and subgraph removal. Our method achieves state-of-the-art performance on many challenging datasets.
arXiv Detail & Related papers (2021-02-19T17:35:18Z)
ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction [61.33144688400446]
We propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules. In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution. At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning.
arXiv Detail & Related papers (2020-07-07T04:22:39Z)
Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning. GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data. We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.