Graph-based Molecular Representation Learning
- URL: http://arxiv.org/abs/2207.04869v3
- Date: Wed, 29 Nov 2023 03:16:59 GMT
- Title: Graph-based Molecular Representation Learning
- Authors: Zhichun Guo, Kehan Guo, Bozhao Nan, Yijun Tian, Roshni G. Iyer, Yihong
Ma, Olaf Wiest, Xiangliang Zhang, Wei Wang, Chuxu Zhang, Nitesh V. Chawla
- Abstract summary: Molecular representation learning (MRL) is a key step to build the connection between machine learning and chemical science.
Recently, MRL has achieved considerable progress, especially in methods based on deep molecular graph learning.
- Score: 59.06193431883431
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Molecular representation learning (MRL) is a key step to build the connection
between machine learning and chemical science. In particular, it encodes
molecules as numerical vectors preserving the molecular structures and
features, on top of which the downstream tasks (e.g., property prediction) can
be performed. Recently, MRL has achieved considerable progress, especially in
methods based on deep molecular graph learning. In this survey, we
systematically review these graph-based molecular representation techniques,
especially the methods incorporating chemical domain knowledge. Specifically,
we first introduce the features of 2D and 3D molecular graphs. Then we
summarize and categorize MRL methods into three groups based on their input.
Furthermore, we discuss some typical chemical applications supported by MRL. To
facilitate studies in this fast-developing area, we also list the benchmarks
and commonly used datasets in the paper. Finally, we share our thoughts on
future research directions.
Related papers
- Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - Molecular Property Prediction Based on Graph Structure Learning [29.516479802217205]
We propose a graph structure learning (GSL) based MPP approach, called GSL-MPP.
Specifically, we first apply graph neural network (GNN) over molecular graphs to extract molecular representations.
With molecular fingerprints, we construct a molecular similarity graph (MSG)
arXiv Detail & Related papers (2023-12-28T06:45:13Z) - Learning Over Molecular Conformer Ensembles: Datasets and Benchmarks [44.934084652800976]
We introduce the first MoleculAR Conformer Ensemble Learning benchmark to thoroughly evaluate the potential of learning on conformer ensembles.
Our findings reveal that direct learning from an conformer space can improve performance on a variety of tasks and models.
arXiv Detail & Related papers (2023-09-29T20:06:46Z) - MolGrapher: Graph-based Visual Recognition of Chemical Structures [50.13749978547401]
We introduce MolGrapher to recognize chemical structures visually.
We treat all candidate atoms and bonds as nodes and put them in a graph.
We classify atom and bond nodes in the graph with a Graph Neural Network.
arXiv Detail & Related papers (2023-08-23T16:16:11Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Improving Molecular Pretraining with Complementary Featurizations [20.86159731100242]
Molecular pretraining is a paradigm to solve a variety of tasks in computational chemistry and drug discovery.
We show that different featurization techniques convey chemical information differently.
We propose a simple and effective MOlecular pretraining framework with COmplementary featurizations (MOCO)
arXiv Detail & Related papers (2022-09-29T21:11:09Z) - Chemical-Reaction-Aware Molecule Representation Learning [88.79052749877334]
We propose using chemical reactions to assist learning molecule representation.
Our approach is proven effective to 1) keep the embedding space well-organized and 2) improve the generalization ability of molecule embeddings.
Experimental results demonstrate that our method achieves state-of-the-art performance in a variety of downstream tasks.
arXiv Detail & Related papers (2021-09-21T00:08:43Z) - ChemRL-GEM: Geometry Enhanced Molecular Representation Learning for
Property Prediction [25.49976851499949]
We propose a novel Geometry Enhanced Molecular representation learning method (GEM) for Chemical Representation Learning (ChemRL)
At first, we design a geometry-based GNN architecture that simultaneously models atoms, bonds, and bond angles in a molecule.
On top of the devised GNN architecture, we propose several novel geometry-level self-supervised learning strategies to learn spatial knowledge.
arXiv Detail & Related papers (2021-06-11T02:35:53Z) - Advanced Graph and Sequence Neural Networks for Molecular Property
Prediction and Drug Discovery [53.00288162642151]
We develop MoleculeKit, a suite of comprehensive machine learning tools spanning different computational models and molecular representations.
Built on these representations, MoleculeKit includes both deep learning and traditional machine learning methods for graph and sequence data.
Results on both online and offline antibiotics discovery and molecular property prediction tasks show that MoleculeKit achieves consistent improvements over prior methods.
arXiv Detail & Related papers (2020-12-02T02:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.