Motif-based Graph Self-Supervised Learning forMolecular Property
Prediction
- URL: http://arxiv.org/abs/2110.00987v1
- Date: Sun, 3 Oct 2021 11:45:51 GMT
- Title: Motif-based Graph Self-Supervised Learning forMolecular Property
Prediction
- Authors: Zaixi Zhang, Qi Liu, Hao Wang, Chengqiang Lu, Chee-Kong Lee
- Abstract summary: Graph Neural Networks (GNNs) have demonstrated remarkable success in various molecular generation and prediction tasks.
Most existing self-supervised pre-training frameworks for GNNs only focus on node-level or graph-level tasks.
We propose a novel self-supervised motif generation framework for GNNs.
- Score: 12.789013658551454
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Predicting molecular properties with data-driven methods has drawn much
attention in recent years. Particularly, Graph Neural Networks (GNNs) have
demonstrated remarkable success in various molecular generation and prediction
tasks. In cases where labeled data is scarce, GNNs can be pre-trained on
unlabeled molecular data to first learn the general semantic and structural
information before being fine-tuned for specific tasks. However, most existing
self-supervised pre-training frameworks for GNNs only focus on node-level or
graph-level tasks. These approaches cannot capture the rich information in
subgraphs or graph motifs. For example, functional groups (frequently-occurred
subgraphs in molecular graphs) often carry indicative information about the
molecular properties. To bridge this gap, we propose Motif-based Graph
Self-supervised Learning (MGSSL) by introducing a novel self-supervised motif
generation framework for GNNs. First, for motif extraction from molecular
graphs, we design a molecule fragmentation method that leverages a
retrosynthesis-based algorithm BRICS and additional rules for controlling the
size of motif vocabulary. Second, we design a general motif-based generative
pre-training framework in which GNNs are asked to make topological and label
predictions. This generative framework can be implemented in two different
ways, i.e., breadth-first or depth-first. Finally, to take the multi-scale
information in molecular graphs into consideration, we introduce a multi-level
self-supervised pre-training. Extensive experiments on various downstream
benchmark tasks show that our methods outperform all state-of-the-art
baselines.
Related papers
- MAGE: Model-Level Graph Neural Networks Explanations via Motif-based Graph Generation [16.129359492539095]
Graph Neural Networks (GNNs) have shown remarkable success in molecular tasks, yet their interpretability remains challenging.
Traditional model-level explanation methods like XGNN and GNNInterpreter often fail to identify valid substructures like rings, leading to questionable interpretability.
We introduce an innovative textbfMotif-btextbfAsed textbfGNN textbfExplainer (MAGE) that uses motifs as fundamental units for generating explanations.
arXiv Detail & Related papers (2024-05-21T06:12:24Z) - Fragment-based Pretraining and Finetuning on Molecular Graphs [0.0]
Unlabeled molecular data has become abundant, which facilitates the rapid development of self-supervised learning for GNNs in the chemical domain.
We propose pretraining GNNs at the fragment level, a promising middle ground to overcome the limitations of node-level and graph-level pretraining.
Our graph fragment-based pretraining (GraphFP) advances the performances on 5 out of 8 common molecular benchmarks and improves the performances on long-range biological benchmarks by at least 11.5%.
arXiv Detail & Related papers (2023-10-05T03:01:09Z) - Will More Expressive Graph Neural Networks do Better on Generative
Tasks? [27.412913421460388]
Graph Neural Network (GNN) architectures are often underexplored.
We replace the underlying GNNs of graph generative models with more expressive GNNs.
advanced GNNs can achieve state-of-the-art results across 17 other non-GNN-based graph generative approaches.
arXiv Detail & Related papers (2023-08-23T07:57:45Z) - GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule
Zero-Shot Learning [71.89623260998934]
This study investigates the feasibility of employing natural language instructions to accomplish molecule-related tasks in a zero-shot setting.
Existing molecule-text models perform poorly in this setting due to inadequate treatment of instructions and limited capacity for graphs.
We propose GIMLET, which unifies language models for both graph and text data.
arXiv Detail & Related papers (2023-05-28T18:27:59Z) - MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular
Representation Learning [77.31492888819935]
We propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT)
MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt.
Experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction.
arXiv Detail & Related papers (2022-12-20T19:32:30Z) - HiGNN: Hierarchical Informative Graph Neural Networks for Molecular
Property Prediction Equipped with Feature-Wise Attention [5.735627221409312]
We propose a well-designed hierarchical informative graph neural networks framework (termed HiGNN) for predicting molecular property.
Experiments demonstrate that HiGNN achieves state-of-the-art predictive performance on many challenging drug discovery-associated benchmark datasets.
arXiv Detail & Related papers (2022-08-30T05:16:15Z) - MolGraph: a Python package for the implementation of molecular graphs
and graph neural networks with TensorFlow and Keras [51.92255321684027]
MolGraph is a graph neural network (GNN) package for molecular machine learning (ML)
MolGraph implements a chemistry module to accommodate the generation of small molecular graphs, which can be passed to a GNN algorithm to solve a molecular ML problem.
GNNs proved useful for molecular identification and improved interpretability of chromatographic retention time data.
arXiv Detail & Related papers (2022-08-21T18:37:41Z) - Graph neural networks for the prediction of molecular structure-property
relationships [59.11160990637615]
Graph neural networks (GNNs) are a novel machine learning method that directly work on the molecular graph.
GNNs allow to learn properties in an end-to-end fashion, thereby avoiding the need for informative descriptors.
We describe the fundamentals of GNNs and demonstrate the application of GNNs via two examples for molecular property prediction.
arXiv Detail & Related papers (2022-07-25T11:30:44Z) - Neural Graph Matching for Pre-training Graph Neural Networks [72.32801428070749]
Graph neural networks (GNNs) have been shown powerful capacity at modeling structural data.
We present a novel Graph Matching based GNN Pre-Training framework, called GMPT.
The proposed method can be applied to fully self-supervised pre-training and coarse-grained supervised pre-training.
arXiv Detail & Related papers (2022-03-03T09:53:53Z) - Molecular Graph Generation via Geometric Scattering [7.796917261490019]
Graph neural networks (GNNs) have been used extensively for addressing problems in drug design and discovery.
We propose a representation-first approach to molecular graph generation.
We show that our architecture learns meaningful representations of drug datasets and provides a platform for goal-directed drug synthesis.
arXiv Detail & Related papers (2021-10-12T18:00:23Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.