Does GNN Pretraining Help Molecular Representation?
- URL: http://arxiv.org/abs/2207.06010v1
- Date: Wed, 13 Jul 2022 07:34:16 GMT
- Title: Does GNN Pretraining Help Molecular Representation?
- Authors: Ruoxi Sun
- Abstract summary: Self-supervised graph pretraining does not have statistically significant advantages over non-pretraining methods in many settings.
Although improvement can be observed with additional supervised pretraining, the improvement may diminish with richer features or more balanced data splits.
We hypothesize the complexity of pretraining on molecules is insufficient, leading to less transferable knowledge for downstream tasks.
- Score: 5.5459878275267736
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Extracting informative representations of molecules using Graph neural
networks (GNNs) is crucial in AI-driven drug discovery. Recently, the graph
research community has been trying to replicate the success of self-supervised
pretraining in natural language processing, with several successes claimed.
However, we find the benefit brought by self-supervised pretraining on
molecular data can be negligible in many cases. We conduct thorough ablation
studies on the key components of GNN pretraining, including pretraining
objectives, data splitting methods, input features, pretraining dataset scales,
and GNN architectures, in deciding the accuracy of the downstream tasks. Our
first important finding is, self-supervised graph pretraining do not have
statistically significant advantages over non-pretraining methods in many
settings. Second, although improvement can be observed with additional
supervised pretraining, the improvement may diminish with richer features or
more balanced data splits. Third, experimental hyper-parameters have a larger
impact on accuracy of downstream tasks than the choice of pretraining tasks. We
hypothesize the complexity of pretraining on molecules is insufficient, leading
to less transferable knowledge for downstream tasks.
Related papers
- Delayed Bottlenecking: Alleviating Forgetting in Pre-trained Graph Neural Networks [19.941727879841142]
We propose a novel underlineDelayed underlineBottlenecking underlinePre-training framework.
It maintains as much as possible mutual information between latent representations and training data during pre-training phase.
arXiv Detail & Related papers (2024-04-23T11:35:35Z) - Transfer Learning for Molecular Property Predictions from Small Data Sets [0.0]
We benchmark common machine learning models for the prediction of molecular properties on two small data sets.
We present a transfer learning strategy that uses large data sets to pre-train the respective models and allows to obtain more accurate models after fine-tuning on the original data sets.
arXiv Detail & Related papers (2024-04-20T14:25:34Z) - Better with Less: A Data-Active Perspective on Pre-Training Graph Neural
Networks [39.71761440499148]
Pre-training on graph neural networks (GNNs) aims to learn transferable knowledge for downstream tasks with unlabeled data.
We propose a better-with-less framework for graph pre-training: fewer, but carefully chosen data are fed into a GNN model.
Experiment results show that the proposed APT is able to obtain an efficient pre-training model with fewer training data and better downstream performance.
arXiv Detail & Related papers (2023-11-02T07:09:59Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Examining the Effect of Pre-training on Time Series Classification [21.38211396933795]
This study investigates the impact of pre-training followed by fine-tuning on the fine-tuning process.
We conducted a thorough examination of 150 classification datasets.
We find that pre-training can only help improve the optimization process for models that fit the data poorly.
Adding more pre-training data does not improve generalization, but it can strengthen the advantage of pre-training on the original data volume.
arXiv Detail & Related papers (2023-09-11T06:26:57Z) - DCLP: Neural Architecture Predictor with Curriculum Contrastive Learning [5.2319020651074215]
We propose a Curricumum-guided Contrastive Learning framework for neural Predictor (DCLP)
Our method simplifies the contrastive task by designing a novel curriculum to enhance the stability of unlabeled training data distribution.
We experimentally demonstrate that DCLP has high accuracy and efficiency compared with existing predictors.
arXiv Detail & Related papers (2023-02-25T08:16:21Z) - Pre-training via Denoising for Molecular Property Prediction [53.409242538744444]
We describe a pre-training technique that utilizes large datasets of 3D molecular structures at equilibrium.
Inspired by recent advances in noise regularization, our pre-training objective is based on denoising.
arXiv Detail & Related papers (2022-05-31T22:28:34Z) - Knowledge Distillation as Efficient Pre-training: Faster Convergence,
Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks.
Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z) - Improved Fine-tuning by Leveraging Pre-training Data: Theory and
Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications.
Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy.
We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z) - Self-Supervised Pretraining Improves Self-Supervised Pretraining [83.1423204498361]
Self-supervised pretraining requires expensive and lengthy computation, large amounts of data, and is sensitive to data augmentation.
This paper explores Hierarchical PreTraining (HPT), which decreases convergence time and improves accuracy by initializing the pretraining process with an existing pretrained model.
We show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or amount of pretraining data.
arXiv Detail & Related papers (2021-03-23T17:37:51Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.