Related papers: Fragment-based Pretraining and Finetuning on Molecular Graphs

Fragment-based Pretraining and Finetuning on Molecular Graphs

URL: http://arxiv.org/abs/2310.03274v2
Date: Sat, 28 Oct 2023 03:22:06 GMT
Title: Fragment-based Pretraining and Finetuning on Molecular Graphs
Authors: Kha-Dinh Luong, Ambuj Singh
Abstract summary: Unlabeled molecular data has become abundant, which facilitates the rapid development of self-supervised learning for GNNs in the chemical domain. We propose pretraining GNNs at the fragment level, a promising middle ground to overcome the limitations of node-level and graph-level pretraining. Our graph fragment-based pretraining (GraphFP) advances the performances on 5 out of 8 common molecular benchmarks and improves the performances on long-range biological benchmarks by at least 11.5%.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Property prediction on molecular graphs is an important application of Graph Neural Networks. Recently, unlabeled molecular data has become abundant, which facilitates the rapid development of self-supervised learning for GNNs in the chemical domain. In this work, we propose pretraining GNNs at the fragment level, a promising middle ground to overcome the limitations of node-level and graph-level pretraining. Borrowing techniques from recent work on principal subgraph mining, we obtain a compact vocabulary of prevalent fragments from a large pretraining dataset. From the extracted vocabulary, we introduce several fragment-based contrastive and predictive pretraining tasks. The contrastive learning task jointly pretrains two different GNNs: one on molecular graphs and the other on fragment graphs, which represents higher-order connectivity within molecules. By enforcing consistency between the fragment embedding and the aggregated embedding of the corresponding atoms from the molecular graphs, we ensure that the embeddings capture structural information at multiple resolutions. The structural information of fragment graphs is further exploited to extract auxiliary labels for graph-level predictive pretraining. We employ both the pretrained molecular-based and fragment-based GNNs for downstream prediction, thus utilizing the fragment information during finetuning. Our graph fragment-based pretraining (GraphFP) advances the performances on 5 out of 8 common molecular benchmarks and improves the performances on long-range biological benchmarks by at least 11.5%. Code is available at: https://github.com/lvkd84/GraphFP.

Related papers

Neural Graph Pattern Machine [50.78679002846741]
We propose the Neural Graph Pattern Machine (GPM), a framework designed to learn directly from graph patterns. GPM efficiently extracts and encodes substructures while identifying the most relevant ones for downstream tasks.
arXiv Detail & Related papers (2025-01-30T20:37:47Z)
Pre-training Graph Neural Networks on Molecules by Using Subgraph-Conditioned Graph Information Bottleneck [2.137573128343838]
This study aims to build a pre-trained Graph Neural Network (GNN) model on molecules without human annotations or prior knowledge. We propose a novel Subgraph-conditioned Graph Information Bottleneck, named S-CGIB, for pre-training GNNs to recognize core subgraphs (graph cores) and significant subgraphs.
arXiv Detail & Related papers (2024-12-20T05:52:30Z)
Investigating Graph Neural Networks and Classical Feature-Extraction Techniques in Activity-Cliff and Molecular Property Prediction [0.6906005491572401]
Molecular featurisation refers to the transformation of molecular data into numerical feature vectors. Message-passing graph neural networks (GNNs) have emerged as a novel method to learn differentiable features directly from molecular graphs.
arXiv Detail & Related papers (2024-11-20T20:07:48Z)
Molecular Property Prediction Based on Graph Structure Learning [29.516479802217205]
We propose a graph structure learning (GSL) based MPP approach, called GSL-MPP. Specifically, we first apply graph neural network (GNN) over molecular graphs to extract molecular representations. With molecular fingerprints, we construct a molecular similarity graph (MSG)
arXiv Detail & Related papers (2023-12-28T06:45:13Z)
Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [55.42602325017405]
We propose a novel method called GODE, which takes into account the two-level structure of individual molecules. By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures. When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks.
arXiv Detail & Related papers (2023-06-02T15:49:45Z)
HiGNN: Hierarchical Informative Graph Neural Networks for Molecular Property Prediction Equipped with Feature-Wise Attention [5.735627221409312]
We propose a well-designed hierarchical informative graph neural networks framework (termed HiGNN) for predicting molecular property. Experiments demonstrate that HiGNN achieves state-of-the-art predictive performance on many challenging drug discovery-associated benchmark datasets.
arXiv Detail & Related papers (2022-08-30T05:16:15Z)
MentorGNN: Deriving Curriculum for Pre-Training GNNs [61.97574489259085]
We propose an end-to-end model named MentorGNN that aims to supervise the pre-training process of GNNs across graphs. We shed new light on the problem of domain adaption on relational data (i.e., graphs) by deriving a natural and interpretable upper bound on the generalization error of the pre-trained GNNs.
arXiv Detail & Related papers (2022-08-21T15:12:08Z)
Graph neural networks for the prediction of molecular structure-property relationships [59.11160990637615]
Graph neural networks (GNNs) are a novel machine learning method that directly work on the molecular graph. GNNs allow to learn properties in an end-to-end fashion, thereby avoiding the need for informative descriptors. We describe the fundamentals of GNNs and demonstrate the application of GNNs via two examples for molecular property prediction.
arXiv Detail & Related papers (2022-07-25T11:30:44Z)
FunQG: Molecular Representation Learning Via Quotient Graphs [0.0]
We propose a novel molecular graph coarsening framework named FunQG. FunQG uses Functional groups as influential building blocks of a molecule to determine its properties. We show that the resulting informative graphs are much smaller than the molecular graphs and thus are good candidates for training GNNs.
arXiv Detail & Related papers (2022-07-18T13:36:20Z)
Discovering the Representation Bottleneck of Graph Neural Networks from Multi-order Interactions [51.597480162777074]
Graph neural networks (GNNs) rely on the message passing paradigm to propagate node features and build interactions. Recent works point out that different graph learning tasks require different ranges of interactions between nodes. We study two common graph construction methods in scientific domains, i.e., emphK-nearest neighbor (KNN) graphs and emphfully-connected (FC) graphs.
arXiv Detail & Related papers (2022-05-15T11:38:14Z)
Neural Graph Matching for Pre-training Graph Neural Networks [72.32801428070749]
Graph neural networks (GNNs) have been shown powerful capacity at modeling structural data. We present a novel Graph Matching based GNN Pre-Training framework, called GMPT. The proposed method can be applied to fully self-supervised pre-training and coarse-grained supervised pre-training.
arXiv Detail & Related papers (2022-03-03T09:53:53Z)
Motif-based Graph Self-Supervised Learning forMolecular Property Prediction [12.789013658551454]
Graph Neural Networks (GNNs) have demonstrated remarkable success in various molecular generation and prediction tasks. Most existing self-supervised pre-training frameworks for GNNs only focus on node-level or graph-level tasks. We propose a novel self-supervised motif generation framework for GNNs.
arXiv Detail & Related papers (2021-10-03T11:45:51Z)
Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning. GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data. We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.