Fragment-based Pretraining and Finetuning on Molecular Graphs
- URL: http://arxiv.org/abs/2310.03274v2
- Date: Sat, 28 Oct 2023 03:22:06 GMT
- Title: Fragment-based Pretraining and Finetuning on Molecular Graphs
- Authors: Kha-Dinh Luong, Ambuj Singh
- Abstract summary: Unlabeled molecular data has become abundant, which facilitates the rapid development of self-supervised learning for GNNs in the chemical domain.
We propose pretraining GNNs at the fragment level, a promising middle ground to overcome the limitations of node-level and graph-level pretraining.
Our graph fragment-based pretraining (GraphFP) advances the performances on 5 out of 8 common molecular benchmarks and improves the performances on long-range biological benchmarks by at least 11.5%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Property prediction on molecular graphs is an important application of Graph
Neural Networks. Recently, unlabeled molecular data has become abundant, which
facilitates the rapid development of self-supervised learning for GNNs in the
chemical domain. In this work, we propose pretraining GNNs at the fragment
level, a promising middle ground to overcome the limitations of node-level and
graph-level pretraining. Borrowing techniques from recent work on principal
subgraph mining, we obtain a compact vocabulary of prevalent fragments from a
large pretraining dataset. From the extracted vocabulary, we introduce several
fragment-based contrastive and predictive pretraining tasks. The contrastive
learning task jointly pretrains two different GNNs: one on molecular graphs and
the other on fragment graphs, which represents higher-order connectivity within
molecules. By enforcing consistency between the fragment embedding and the
aggregated embedding of the corresponding atoms from the molecular graphs, we
ensure that the embeddings capture structural information at multiple
resolutions. The structural information of fragment graphs is further exploited
to extract auxiliary labels for graph-level predictive pretraining. We employ
both the pretrained molecular-based and fragment-based GNNs for downstream
prediction, thus utilizing the fragment information during finetuning. Our
graph fragment-based pretraining (GraphFP) advances the performances on 5 out
of 8 common molecular benchmarks and improves the performances on long-range
biological benchmarks by at least 11.5%. Code is available at:
https://github.com/lvkd84/GraphFP.
Related papers
- Investigating Graph Neural Networks and Classical Feature-Extraction Techniques in Activity-Cliff and Molecular Property Prediction [0.6906005491572401]
Molecular featurisation refers to the transformation of molecular data into numerical feature vectors.
Message-passing graph neural networks (GNNs) have emerged as a novel method to learn differentiable features directly from molecular graphs.
arXiv Detail & Related papers (2024-11-20T20:07:48Z) - Molecular Property Prediction Based on Graph Structure Learning [29.516479802217205]
We propose a graph structure learning (GSL) based MPP approach, called GSL-MPP.
Specifically, we first apply graph neural network (GNN) over molecular graphs to extract molecular representations.
With molecular fingerprints, we construct a molecular similarity graph (MSG)
arXiv Detail & Related papers (2023-12-28T06:45:13Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule
Representations [55.42602325017405]
We propose a novel method called GODE, which takes into account the two-level structure of individual molecules.
By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures.
When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - HiGNN: Hierarchical Informative Graph Neural Networks for Molecular
Property Prediction Equipped with Feature-Wise Attention [5.735627221409312]
We propose a well-designed hierarchical informative graph neural networks framework (termed HiGNN) for predicting molecular property.
Experiments demonstrate that HiGNN achieves state-of-the-art predictive performance on many challenging drug discovery-associated benchmark datasets.
arXiv Detail & Related papers (2022-08-30T05:16:15Z) - MentorGNN: Deriving Curriculum for Pre-Training GNNs [61.97574489259085]
We propose an end-to-end model named MentorGNN that aims to supervise the pre-training process of GNNs across graphs.
We shed new light on the problem of domain adaption on relational data (i.e., graphs) by deriving a natural and interpretable upper bound on the generalization error of the pre-trained GNNs.
arXiv Detail & Related papers (2022-08-21T15:12:08Z) - Graph neural networks for the prediction of molecular structure-property
relationships [59.11160990637615]
Graph neural networks (GNNs) are a novel machine learning method that directly work on the molecular graph.
GNNs allow to learn properties in an end-to-end fashion, thereby avoiding the need for informative descriptors.
We describe the fundamentals of GNNs and demonstrate the application of GNNs via two examples for molecular property prediction.
arXiv Detail & Related papers (2022-07-25T11:30:44Z) - FunQG: Molecular Representation Learning Via Quotient Graphs [0.0]
We propose a novel molecular graph coarsening framework named FunQG.
FunQG uses Functional groups as influential building blocks of a molecule to determine its properties.
We show that the resulting informative graphs are much smaller than the molecular graphs and thus are good candidates for training GNNs.
arXiv Detail & Related papers (2022-07-18T13:36:20Z) - Discovering the Representation Bottleneck of Graph Neural Networks from
Multi-order Interactions [51.597480162777074]
Graph neural networks (GNNs) rely on the message passing paradigm to propagate node features and build interactions.
Recent works point out that different graph learning tasks require different ranges of interactions between nodes.
We study two common graph construction methods in scientific domains, i.e., emphK-nearest neighbor (KNN) graphs and emphfully-connected (FC) graphs.
arXiv Detail & Related papers (2022-05-15T11:38:14Z) - Neural Graph Matching for Pre-training Graph Neural Networks [72.32801428070749]
Graph neural networks (GNNs) have been shown powerful capacity at modeling structural data.
We present a novel Graph Matching based GNN Pre-Training framework, called GMPT.
The proposed method can be applied to fully self-supervised pre-training and coarse-grained supervised pre-training.
arXiv Detail & Related papers (2022-03-03T09:53:53Z) - Motif-based Graph Self-Supervised Learning forMolecular Property
Prediction [12.789013658551454]
Graph Neural Networks (GNNs) have demonstrated remarkable success in various molecular generation and prediction tasks.
Most existing self-supervised pre-training frameworks for GNNs only focus on node-level or graph-level tasks.
We propose a novel self-supervised motif generation framework for GNNs.
arXiv Detail & Related papers (2021-10-03T11:45:51Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.