Related papers: Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking

Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking

URL: http://arxiv.org/abs/2306.10453v3
Date: Sat, 18 Nov 2023 19:03:50 GMT
Title: Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking
Authors: Juanhui Li, Harry Shomer, Haitao Mao, Shenglai Zeng, Yao Ma, Neil Shah, Jiliang Tang, Dawei Yin
Abstract summary: Link prediction attempts to predict whether an unseen edge exists based on only a portion of edges of a graph. A flurry of methods have been introduced in recent years that attempt to make use of graph neural networks (GNNs) for this task. New and diverse datasets have also been created to better evaluate the effectiveness of these new models.
Score: 66.83273589348758
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Link prediction attempts to predict whether an unseen edge exists based on only a portion of edges of a graph. A flurry of methods have been introduced in recent years that attempt to make use of graph neural networks (GNNs) for this task. Furthermore, new and diverse datasets have also been created to better evaluate the effectiveness of these new models. However, multiple pitfalls currently exist that hinder our ability to properly evaluate these new methods. These pitfalls mainly include: (1) Lower than actual performance on multiple baselines, (2) A lack of a unified data split and evaluation metric on some datasets, and (3) An unrealistic evaluation setting that uses easy negative samples. To overcome these challenges, we first conduct a fair comparison across prominent methods and datasets, utilizing the same dataset and hyperparameter search settings. We then create a more practical evaluation setting based on a Heuristic Related Sampling Technique (HeaRT), which samples hard negative samples via multiple heuristics. The new evaluation setting helps promote new challenges and opportunities in link prediction by aligning the evaluation with real-world situations. Our implementation and data are available at https://github.com/Juanhui28/HeaRT

Related papers

New Perspectives on the Evaluation of Link Prediction Algorithms for Dynamic Graphs [12.987894327817159]
We introduce novel visualization methods that can yield insight into prediction performance and the dynamics of temporal networks. We validate empirically, on datasets extracted from recent benchmarks, that the error is typically not evenly distributed across different data segments.
arXiv Detail & Related papers (2023-11-30T11:57:07Z)
Towards Mitigating more Challenging Spurious Correlations: A Benchmark & New Datasets [43.64631697043496]
Deep neural networks often exploit non-predictive features that are spuriously correlated with class labels. Despite the growing body of recent works on remedying spurious correlations, the lack of a standardized benchmark hinders reproducible evaluation. We present SpuCo, a python package with modular implementations of state-of-the-art solutions enabling easy and reproducible evaluation.
arXiv Detail & Related papers (2023-06-21T00:59:06Z)
IRTCI: Item Response Theory for Categorical Imputation [5.9952530228468754]
Several imputation techniques have been designed to replace missing data with stand in values. The work showcased here offers a novel means for categorical imputation based on item response theory (IRT) Analyses comparing these techniques were performed on three different datasets.
arXiv Detail & Related papers (2023-02-08T16:17:20Z)
From Spectral Graph Convolutions to Large Scale Graph Convolutional Networks [0.0]
Graph Convolutional Networks (GCNs) have been shown to be a powerful concept that has been successfully applied to a large variety of tasks. We study the theory that paved the way to the definition of GCN, including related parts of classical graph theory.
arXiv Detail & Related papers (2022-07-12T16:57:08Z)
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video. Recent studies have found that current benchmark datasets may have obvious moment annotation biases. We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z)
SCE: Scalable Network Embedding from Sparsest Cut [20.08464038805681]
Large-scale network embedding is to learn a latent representation for each node in an unsupervised manner. A key of success to such contrastive learning methods is how to draw positive and negative samples. In this paper, we propose SCE for unsupervised network embedding only using negative samples for training.
arXiv Detail & Related papers (2020-06-30T03:18:15Z)
Evaluating Models' Local Decision Boundaries via Contrast Sets [119.38387782979474]
We propose a new annotation paradigm for NLP that helps to close systematic gaps in the test data. We demonstrate the efficacy of contrast sets by creating them for 10 diverse NLP datasets. Although our contrast sets are not explicitly adversarial, model performance is significantly lower on them than on the original test sets.
arXiv Detail & Related papers (2020-04-06T14:47:18Z)
Frustratingly Simple Few-Shot Object Detection [98.42824677627581]
We find that fine-tuning only the last layer of existing detectors on rare classes is crucial to the few-shot object detection task. Such a simple approach outperforms the meta-learning methods by roughly 220 points on current benchmarks.
arXiv Detail & Related papers (2020-03-16T00:29:14Z)
PushNet: Efficient and Adaptive Neural Message Passing [1.9121961872220468]
Message passing neural networks have recently evolved into a state-of-the-art approach to representation learning on graphs. Existing methods perform synchronous message passing along all edges in multiple subsequent rounds. We consider a novel asynchronous message passing approach where information is pushed only along the most relevant edges until convergence.
arXiv Detail & Related papers (2020-03-04T18:15:30Z)
Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples. We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries. We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
Benchmarking Network Embedding Models for Link Prediction: Are We Making Progress? [84.43405961569256]
We shed light on the state-of-the-art of network embedding methods for link prediction. We show, using a consistent evaluation pipeline, that only thin progress has been made over the last years. We argue that standardized evaluation tools can repair this situation and boost future progress in this field.
arXiv Detail & Related papers (2020-02-25T16:59:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.