Related papers: Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs

Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs

URL: http://arxiv.org/abs/2407.04999v1
Date: Sat, 6 Jul 2024 08:33:23 GMT
Title: Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs
Authors: Zhengdao Li, Yong Cao, Kefan Shuai, Yiming Miao, Kai Hwang,
Abstract summary: We propose an empirical protocol based on a fair benchmarking framework to investigate the performance discrepancy between simple methods and GNNs. We also propose a novel metric to quantify the dataset effectiveness by considering both dataset complexity and model performance. Our findings shed light on the current understanding of benchmark datasets, and our new platform could fuel the future evolution of graph classification benchmarks.
Score: 7.407592553310068
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graph classification benchmarks, vital for assessing and developing graph neural networks (GNNs), have recently been scrutinized, as simple methods like MLPs have demonstrated comparable performance. This leads to an important question: Do these benchmarks effectively distinguish the advancements of GNNs over other methodologies? If so, how do we quantitatively measure this effectiveness? In response, we first propose an empirical protocol based on a fair benchmarking framework to investigate the performance discrepancy between simple methods and GNNs. We further propose a novel metric to quantify the dataset effectiveness by considering both dataset complexity and model performance. To the best of our knowledge, our work is the first to thoroughly study and provide an explicit definition for dataset effectiveness in the graph learning area. Through testing across 16 real-world datasets, we found our metric to align with existing studies and intuitive assumptions. Finally, we explore the causes behind the low effectiveness of certain datasets by investigating the correlation between intrinsic graph properties and class labels, and we developed a novel technique supporting the correlation-controllable synthetic dataset generation. Our findings shed light on the current understanding of benchmark datasets, and our new platform could fuel the future evolution of graph classification benchmarks.

Related papers

No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets [16.040478092904632]
We introduce RINGS, a flexible and mode-perturbation framework to assess the quality of graph-learning datasets. Within this framework, we propose two measures -- performance separability and mode complementarity -- as evaluation tools. We derive actionable recommendations for improving the evaluation of graph-learning methods.
arXiv Detail & Related papers (2025-02-04T14:59:03Z)
ACTGNN: Assessment of Clustering Tendency with Synthetically-Trained Graph Neural Networks [4.668678950572517]
ACTGNN is a graph-based framework designed to assess clustering tendency by leveraging graph representations of data. A Graph Neural Network (GNN) is trained exclusively on synthetic datasets, enabling robust learning of clustering structures. Our results highlight the generalizability and effectiveness of the proposed approach, making it a promising tool for robust clustering tendency assessment.
arXiv Detail & Related papers (2025-01-30T03:31:26Z)
Novel Representation Learning Technique using Graphs for Performance Analytics [0.0]
We propose a novel idea of transforming performance data into graphs to leverage the advancement of Graph Neural Network-based (GNN) techniques. In contrast to other Machine Learning application domains, such as social networks, the graph is not given; instead, we need to build it. We evaluate the effectiveness of the generated embeddings from GNNs based on how well they make even a simple feed-forward neural network perform for regression tasks.
arXiv Detail & Related papers (2024-01-19T16:34:37Z)
GOODAT: Towards Test-time Graph Out-of-Distribution Detection [103.40396427724667]
Graph neural networks (GNNs) have found widespread application in modeling graph data across diverse domains. Recent studies have explored graph OOD detection, often focusing on training a specific model or modifying the data on top of a well-trained GNN. This paper introduces a data-centric, unsupervised, and plug-and-play solution that operates independently of training data and modifications of GNN architecture.
arXiv Detail & Related papers (2024-01-10T08:37:39Z)
A Metadata-Driven Approach to Understand Graph Neural Networks [17.240017543449735]
We propose a $textitmetadata-driven$ approach to analyze the sensitivity of GNNs to graph data properties. Our theoretical findings reveal that datasets with more balanced degree distribution exhibit better linear separability of node representations.
arXiv Detail & Related papers (2023-10-30T04:25:02Z)
Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis [50.972595036856035]
We present a code that successfully replicates results from six popular and recent graph recommendation models. We compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations. By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure.
arXiv Detail & Related papers (2023-08-01T09:31:44Z)
Addressing the Impact of Localized Training Data in Graph Neural Networks [0.0]
Graph Neural Networks (GNNs) have achieved notable success in learning from graph-structured data. This article aims to assess the impact of training GNNs on localized subsets of the graph. We propose a regularization method to minimize distributional discrepancies between localized training data and graph inference.
arXiv Detail & Related papers (2023-07-24T11:04:22Z)
Bures-Wasserstein Means of Graphs [60.42414991820453]
We propose a novel framework for defining a graph mean via embeddings in the space of smooth graph signal distributions. By finding a mean in this embedding space, we can recover a mean graph that preserves structural information. We establish the existence and uniqueness of the novel graph mean, and provide an iterative algorithm for computing it.
arXiv Detail & Related papers (2023-05-31T11:04:53Z)
Strengthening structural baselines for graph classification using Local Topological Profile [0.0]
We present the analysis of the topological graph descriptor Local Degree Profile (LDP), which forms a widely used structural baseline for graph classification. We propose a new baseline algorithm called Local Topological Profile (adam), which extends LDP by using additional centrality measures and local descriptors.
arXiv Detail & Related papers (2023-05-01T08:59:58Z)
Energy-based Out-of-Distribution Detection for Graph Neural Networks [76.0242218180483]
We propose a simple, powerful and efficient OOD detection model for GNN-based learning on graphs, which we call GNNSafe. GNNSafe achieves up to $17.0%$ AUROC improvement over state-of-the-arts and it could serve as simple yet strong baselines in such an under-developed area.
arXiv Detail & Related papers (2023-02-06T16:38:43Z)
Benchmarking Node Outlier Detection on Graphs [90.29966986023403]
Graph outlier detection is an emerging but crucial machine learning task with numerous applications. We present the first comprehensive unsupervised node outlier detection benchmark for graphs called UNOD.
arXiv Detail & Related papers (2022-06-21T01:46:38Z)
Optimal Propagation for Graph Neural Networks [51.08426265813481]
We propose a bi-level optimization approach for learning the optimal graph structure. We also explore a low-rank approximation model for further reducing the time complexity.
arXiv Detail & Related papers (2022-05-06T03:37:00Z)
Tackling Oversmoothing of GNNs with Contrastive Learning [35.88575306925201]
Graph neural networks (GNNs) integrate the comprehensive relation of graph data and representation learning capability. Oversmoothing makes the final representations of nodes indiscriminative, thus deteriorating the node classification and link prediction performance. We propose the Topology-guided Graph Contrastive Layer, named TGCL, which is the first de-oversmoothing method maintaining all three mentioned metrics.
arXiv Detail & Related papers (2021-10-26T15:56:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.