Related papers: The Split Matters: Flat Minima Methods for Improving the Performance of GNNs

The Split Matters: Flat Minima Methods for Improving the Performance of GNNs

URL: http://arxiv.org/abs/2306.09121v1
Date: Thu, 15 Jun 2023 13:29:09 GMT
Title: The Split Matters: Flat Minima Methods for Improving the Performance of GNNs
Authors: Nicolas Lell and Ansgar Scherp
Abstract summary: We investigate flat minima methods and combinations of those methods for training graph neural networks (GNNs) We conduct experiments on small and large citation, co-purchase, and protein datasets with different train-test splits. Results show that flat minima methods can improve the performance of GNN models by over 2 points, if the train-test split is randomized.
Score: 2.9443230571766854
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When training a Neural Network, it is optimized using the available training data with the hope that it generalizes well to new or unseen testing data. At the same absolute value, a flat minimum in the loss landscape is presumed to generalize better than a sharp minimum. Methods for determining flat minima have been mostly researched for independent and identically distributed (i. i. d.) data such as images. Graphs are inherently non-i. i. d. since the vertices are edge-connected. We investigate flat minima methods and combinations of those methods for training graph neural networks (GNNs). We use GCN and GAT as well as extend Graph-MLP to work with more layers and larger graphs. We conduct experiments on small and large citation, co-purchase, and protein datasets with different train-test splits in both the transductive and inductive training procedure. Results show that flat minima methods can improve the performance of GNN models by over 2 points, if the train-test split is randomized. Following Shchur et al., randomized splits are essential for a fair evaluation of GNNs, as other (fixed) splits like 'Planetoid' are biased. Overall, we provide important insights for improving and fairly evaluating flat minima methods on GNNs. We recommend practitioners to always use weight averaging techniques, in particular EWA when using early stopping. While weight averaging techniques are only sometimes the best performing method, they are less sensitive to hyperparameters, need no additional training, and keep the original model unchanged. All source code is available in https://github.com/Foisunt/FMMs-in-GNNs.

Related papers

Graph Learning at Scale: Characterizing and Optimizing Pre-Propagation GNNs [9.21955649907066]
Pre-propagation GNNs represent a new class of models that decouple feature propagation from training through pre-processing. This paper provides a comprehensive characterization of PP-GNNs, comparing them with graph-sampling-based methods in training efficiency, scalability, and accuracy. We propose optimized data loading schemes and tailored training methods that improve PP-GNN training throughput by an average of 15$times$ over the PP-GNN baselines.
arXiv Detail & Related papers (2025-04-17T18:20:40Z)
Classifying Nodes in Graphs without GNNs [50.311528896010785]
We propose a fully GNN-free approach for node classification, not requiring them at train or test time. Our method consists of three key components: smoothness constraints, pseudo-labeling iterations and neighborhood-label histograms.
arXiv Detail & Related papers (2024-02-08T18:59:30Z)
Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs. Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors. We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z)
GRAPES: Learning to Sample Graphs for Scalable Graph Neural Networks [2.4175455407547015]
Graph neural networks learn to represent nodes by aggregating information from their neighbors. Several existing methods address this by sampling a small subset of nodes, scaling GNNs to much larger graphs. We introduce GRAPES, an adaptive sampling method that learns to identify the set of nodes crucial for training a GNN.
arXiv Detail & Related papers (2023-10-05T09:08:47Z)
Sharpness-Aware Graph Collaborative Filtering [31.133543641102914]
Graph Neural Networks (GNNs) have achieved impressive performance in collaborative training. GNNs tend to yield inferior performance when the distributions of and test data are not aligned well. We propose an effective training schema, called gSAM, under the principle that the textitflatter minima has a better filtering ability than the SAMitsharper ones.
arXiv Detail & Related papers (2023-07-18T01:02:20Z)
Graph Ladling: Shockingly Simple Parallel GNN Training without Intermediate Communication [100.51884192970499]
GNNs are a powerful family of neural networks for learning over graphs. scaling GNNs either by deepening or widening suffers from prevalent issues of unhealthy gradients, over-smoothening, information squashing. We propose not to deepen or widen current GNNs, but instead present a data-centric perspective of model soups tailored for GNNs.
arXiv Detail & Related papers (2023-06-18T03:33:46Z)
Towards Sparsification of Graph Neural Networks [9.568566305616656]
We use two state-of-the-art model compression methods to train and prune and sparse training for the sparsification of weight layers in GNNs. We evaluate and compare the efficiency of both methods in terms of accuracy, training sparsity, and training FLOPs on real-world graphs.
arXiv Detail & Related papers (2022-09-11T01:39:29Z)
Neural Graph Matching for Pre-training Graph Neural Networks [72.32801428070749]
Graph neural networks (GNNs) have been shown powerful capacity at modeling structural data. We present a novel Graph Matching based GNN Pre-Training framework, called GMPT. The proposed method can be applied to fully self-supervised pre-training and coarse-grained supervised pre-training.
arXiv Detail & Related papers (2022-03-03T09:53:53Z)
Scalable Consistency Training for Graph Neural Networks via Self-Ensemble Self-Distillation [13.815063206114713]
We introduce a novel consistency training method to improve accuracy of graph neural networks (GNNs) For a target node we generate different neighborhood expansions, and distill the knowledge of the average of the predictions to the GNN. Our method approximates the expected prediction of the possible neighborhood samples and practically only requires a few samples.
arXiv Detail & Related papers (2021-10-12T19:24:42Z)
Shift-Robust GNNs: Overcoming the Limitations of Localized Graph Training data [52.771780951404565]
Shift-Robust GNN (SR-GNN) is designed to account for distributional differences between biased training data and the graph's true inference distribution. We show that SR-GNN outperforms other GNN baselines by accuracy, eliminating at least (40%) of the negative effects introduced by biased training data.
arXiv Detail & Related papers (2021-08-02T18:00:38Z)
Combining Label Propagation and Simple Models Out-performs Graph Neural Networks [52.121819834353865]
We show that for many standard transductive node classification benchmarks, we can exceed or match the performance of state-of-the-art GNNs. We call this overall procedure Correct and Smooth (C&S) Our approach exceeds or nearly matches the performance of state-of-the-art GNNs on a wide variety of benchmarks.
arXiv Detail & Related papers (2020-10-27T02:10:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.