Related papers: Pitfalls in Link Prediction with Graph Neural Networks: Understanding the Impact of Target-link Inclusion & Better Practices

Pitfalls in Link Prediction with Graph Neural Networks: Understanding the Impact of Target-link Inclusion & Better Practices

URL: http://arxiv.org/abs/2306.00899v2
Date: Mon, 18 Dec 2023 01:10:59 GMT
Title: Pitfalls in Link Prediction with Graph Neural Networks: Understanding the Impact of Target-link Inclusion & Better Practices
Authors: Jing Zhu, Yuhang Zhou, Vassilis N. Ioannidis, Shengyi Qian, Wei Ai, Xiang Song, Danai Koutra
Abstract summary: Graph Neural Networks (GNNs) are remarkably successful in a variety of high-impact applications. In link prediction, the common practices of including the edges being predicted in the graph at training and/or test have outsized impact on the performance of low-degree nodes. We introduce an effective and efficient GNN training framework, SpotTarget, which leverages our insight on low-degree nodes.
Score: 28.88423949622
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: While Graph Neural Networks (GNNs) are remarkably successful in a variety of high-impact applications, we demonstrate that, in link prediction, the common practices of including the edges being predicted in the graph at training and/or test have outsized impact on the performance of low-degree nodes. We theoretically and empirically investigate how these practices impact node-level performance across different degrees. Specifically, we explore three issues that arise: (I1) overfitting; (I2) distribution shift; and (I3) implicit test leakage. The former two issues lead to poor generalizability to the test data, while the latter leads to overestimation of the model's performance and directly impacts the deployment of GNNs. To address these issues in a systematic way, we introduce an effective and efficient GNN training framework, SpotTarget, which leverages our insight on low-degree nodes: (1) at training time, it excludes a (training) edge to be predicted if it is incident to at least one low-degree node; and (2) at test time, it excludes all test edges to be predicted (thus, mimicking real scenarios of using GNNs, where the test data is not included in the graph). SpotTarget helps researchers and practitioners adhere to best practices for learning from graph data, which are frequently overlooked even by the most widely-used frameworks. Our experiments on various real-world datasets show that SpotTarget makes GNNs up to 15x more accurate in sparse graphs, and significantly improves their performance for low-degree nodes in dense graphs.

Related papers

Online GNN Evaluation Under Test-time Graph Distribution Shifts [92.4376834462224]
A new research problem, online GNN evaluation, aims to provide valuable insights into the well-trained GNNs's ability to generalize to real-world unlabeled graphs. We develop an effective learning behavior discrepancy score, dubbed LeBeD, to estimate the test-time generalization errors of well-trained GNN models.
arXiv Detail & Related papers (2024-03-15T01:28:08Z)
Breaking the Entanglement of Homophily and Heterophily in Semi-supervised Node Classification [25.831508778029097]
We introduce AMUD, which quantifies the relationship between node profiles and topology from a statistical perspective. We also propose ADPA as a new directed graph learning paradigm for AMUD.
arXiv Detail & Related papers (2023-12-07T07:54:11Z)
GNNEvaluator: Evaluating GNN Performance On Unseen Graphs Without Labels [81.93520935479984]
We study a new problem, GNN model evaluation, that aims to assess the performance of a specific GNN model trained on labeled and observed graphs. We propose a two-stage GNN model evaluation framework, including (1) DiscGraph set construction and (2) GNNEvaluator training and inference. Under the effective training supervision from the DiscGraph set, GNNEvaluator learns to precisely estimate node classification accuracy of the to-be-evaluated GNN model.
arXiv Detail & Related papers (2023-10-23T05:51:59Z)
Towards Temporal Edge Regression: A Case Study on Agriculture Trade Between Nations [4.612412025217201]
Graph Neural Networks (GNNs) have shown promising performance in tasks on dynamic graphs. In this paper, we explore the application of GNNs to edge regression tasks in both static and dynamic settings.
arXiv Detail & Related papers (2023-08-15T17:13:16Z)
Addressing the Impact of Localized Training Data in Graph Neural Networks [0.0]
Graph Neural Networks (GNNs) have achieved notable success in learning from graph-structured data. This article aims to assess the impact of training GNNs on localized subsets of the graph. We propose a regularization method to minimize distributional discrepancies between localized training data and graph inference.
arXiv Detail & Related papers (2023-07-24T11:04:22Z)
Stable Prediction on Graphs with Agnostic Distribution Shift [105.12836224149633]
Graph neural networks (GNNs) have been shown to be effective on various graph tasks with randomly separated training and testing data. In real applications, however, the distribution of training graph might be different from that of the test one. We propose a novel stable prediction framework for GNNs, which permits both locally and globally stable learning and prediction on graphs.
arXiv Detail & Related papers (2021-10-08T02:45:47Z)
Shift-Robust GNNs: Overcoming the Limitations of Localized Graph Training data [52.771780951404565]
Shift-Robust GNN (SR-GNN) is designed to account for distributional differences between biased training data and the graph's true inference distribution. We show that SR-GNN outperforms other GNN baselines by accuracy, eliminating at least (40%) of the negative effects introduced by biased training data.
arXiv Detail & Related papers (2021-08-02T18:00:38Z)
Combining Label Propagation and Simple Models Out-performs Graph Neural Networks [52.121819834353865]
We show that for many standard transductive node classification benchmarks, we can exceed or match the performance of state-of-the-art GNNs. We call this overall procedure Correct and Smooth (C&S) Our approach exceeds or nearly matches the performance of state-of-the-art GNNs on a wide variety of benchmarks.
arXiv Detail & Related papers (2020-10-27T02:10:52Z)
Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning [63.97983530843762]
Graph Neural Networks (GNNs) have achieved great success in graph representation learning. GNNs generate identical representations for graph substructures that may in fact be very different. More powerful GNNs, proposed recently by mimicking higher-order tests, are inefficient as they cannot sparsity of underlying graph structure. We propose Distance Depiction (DE) as a new class of graph representation learning.
arXiv Detail & Related papers (2020-08-31T23:15:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.