Related papers: Better with Less: A Data-Active Perspective on Pre-Training Graph Neural Networks

Better with Less: A Data-Active Perspective on Pre-Training Graph Neural Networks

URL: http://arxiv.org/abs/2311.01038v2
Date: Tue, 21 Nov 2023 05:48:06 GMT
Title: Better with Less: A Data-Active Perspective on Pre-Training Graph Neural Networks
Authors: Jiarong Xu, Renhong Huang, Xin Jiang, Yuxuan Cao, Carl Yang, Chunping Wang, Yang Yang
Abstract summary: Pre-training on graph neural networks (GNNs) aims to learn transferable knowledge for downstream tasks with unlabeled data. We propose a better-with-less framework for graph pre-training: fewer, but carefully chosen data are fed into a GNN model. Experiment results show that the proposed APT is able to obtain an efficient pre-training model with fewer training data and better downstream performance.
Score: 39.71761440499148
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pre-training on graph neural networks (GNNs) aims to learn transferable knowledge for downstream tasks with unlabeled data, and it has recently become an active research area. The success of graph pre-training models is often attributed to the massive amount of input data. In this paper, however, we identify the curse of big data phenomenon in graph pre-training: more training data do not necessarily lead to better downstream performance. Motivated by this observation, we propose a better-with-less framework for graph pre-training: fewer, but carefully chosen data are fed into a GNN model to enhance pre-training. The proposed pre-training pipeline is called the data-active graph pre-training (APT) framework, and is composed of a graph selector and a pre-training model. The graph selector chooses the most representative and instructive data points based on the inherent properties of graphs as well as predictive uncertainty. The proposed predictive uncertainty, as feedback from the pre-training model, measures the confidence level of the model in the data. When fed with the chosen data, on the other hand, the pre-training model grasps an initial understanding of the new, unseen data, and at the same time attempts to remember the knowledge learned from previous data. Therefore, the integration and interaction between these two components form a unified framework (APT), in which graph pre-training is performed in a progressive and iterative way. Experiment results show that the proposed APT is able to obtain an efficient pre-training model with fewer training data and better downstream performance.

Related papers

Towards Foundation Models on Graphs: An Analysis on Cross-Dataset Transfer of Pretrained GNNs [25.58407005007563]
We study the extent to which pretrained Graph Neural Networks can be applied across datasets. We propose an extension to capture feature information while still being feature-agnostic.
arXiv Detail & Related papers (2024-12-23T14:28:56Z)
When to Pre-Train Graph Neural Networks? From Data Generation Perspective! [19.239863500722983]
Graph pre-training aims to acquire transferable knowledge from unlabeled graph data to improve downstream performance. This paper introduces a generic framework W2PGNN to answer the question of when to pre-train. W2PGNN offers three broad applications: providing the application scope of graph pre-trained models, the feasibility of pre-training, and assistance in selecting pre-training data to enhance downstream performance.
arXiv Detail & Related papers (2023-03-29T05:05:02Z)
SEPT: Towards Scalable and Efficient Visual Pre-Training [11.345844145289524]
Self-supervised pre-training has shown great potential in leveraging large-scale unlabeled data to improve downstream task performance. We build a task-specific self-supervised pre-training framework based on a simple hypothesis that pre-training on the unlabeled samples with similar distribution to the target task can bring substantial performance gains.
arXiv Detail & Related papers (2022-12-11T11:02:11Z)
Diving into Unified Data-Model Sparsity for Class-Imbalanced Graph Representation Learning [30.23894624193583]
Graph Neural Networks (GNNs) training upon non-Euclidean graph data often encounters relatively higher time costs. We develop a unified data-model dynamic sparsity framework named Graph Decantation (GraphDec) to address challenges brought by training upon a massive class-imbalanced graph data.
arXiv Detail & Related papers (2022-10-01T01:47:00Z)
Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage. We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z)
Neural Graph Matching for Pre-training Graph Neural Networks [72.32801428070749]
Graph neural networks (GNNs) have been shown powerful capacity at modeling structural data. We present a novel Graph Matching based GNN Pre-Training framework, called GMPT. The proposed method can be applied to fully self-supervised pre-training and coarse-grained supervised pre-training.
arXiv Detail & Related papers (2022-03-03T09:53:53Z)
Improved Fine-tuning by Leveraging Pre-training Data: Theory and Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications. Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy. We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z)
Self-Supervised Pretraining Improves Self-Supervised Pretraining [83.1423204498361]
Self-supervised pretraining requires expensive and lengthy computation, large amounts of data, and is sensitive to data augmentation. This paper explores Hierarchical PreTraining (HPT), which decreases convergence time and improves accuracy by initializing the pretraining process with an existing pretrained model. We show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or amount of pretraining data.
arXiv Detail & Related papers (2021-03-23T17:37:51Z)
Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training. We experimentally verify that the new dataset can significantly improve the ability of the learned FER model. To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.