The Impact of Data Characteristics on GNN Evaluation for Detecting Fake News
- URL: http://arxiv.org/abs/2512.06638v1
- Date: Sun, 07 Dec 2025 03:00:38 GMT
- Title: The Impact of Data Characteristics on GNN Evaluation for Detecting Fake News
- Authors: Isha Karn, David Jensen,
- Abstract summary: Graph neural networks (GNNs) are widely used for the detection of fake news by modeling the content and propagation structure of news articles on social media.<n>We show that two of the most commonly used benchmark data sets - GossipCop and PolitiFact - are poorly suited to evaluating the utility of models that use propagation structure.
- Score: 0.04774522315161165
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graph neural networks (GNNs) are widely used for the detection of fake news by modeling the content and propagation structure of news articles on social media. We show that two of the most commonly used benchmark data sets - GossipCop and PolitiFact - are poorly suited to evaluating the utility of models that use propagation structure. Specifically, these data sets exhibit shallow, ego-like graph topologies that provide little or no ability to differentiate among modeling methods. We systematically benchmark five GNN architectures against a structure-agnostic multilayer perceptron (MLP) that uses the same node features. We show that MLPs match or closely trail the performance of GNNs, with performance gaps often within 1-2% and overlapping confidence intervals. To isolate the contribution of structure in these datasets, we conduct controlled experiments where node features are shuffled or edge structures randomized. We find that performance collapses under feature shuffling but remains stable under edge randomization. This suggests that structure plays a negligible role in these benchmarks. Structural analysis further reveals that over 75% of nodes are only one hop from the root, exhibiting minimal structural diversity. In contrast, on synthetic datasets where node features are noisy and structure is informative, GNNs significantly outperform MLPs. These findings provide strong evidence that widely used benchmarks do not meaningfully test the utility of modeling structural features, and they motivate the development of datasets with richer, more diverse graph topologies.
Related papers
- Adapting to Heterophilic Graph Data with Structure-Guided Neighbor Discovery [31.368672838207022]
Graph Neural Networks (GNNs) often struggle with heterophilic data, where connected nodes may have dissimilar labels.<n>We propose creating alternative graph structures by linking nodes with similar structural attributes.<n>We introduce Structure-Guided GNN (SG-GNN), an architecture that processes the original graph alongside the newly created structural graphs.
arXiv Detail & Related papers (2025-06-10T15:03:23Z) - Graph Neural Networks Are More Than Filters: Revisiting and Benchmarking from A Spectral Perspective [49.613774305350084]
Graph Neural Networks (GNNs) have achieved remarkable success in various graph-based learning tasks.<n>Recent studies suggest that other components such as non-linear layers may also significantly affect how GNNs process the input graph data in the spectral domain.<n>This paper introduces a comprehensive benchmark to measure and evaluate GNNs' capability in capturing and leveraging the information encoded in different frequency components of the input graph data.
arXiv Detail & Related papers (2024-12-10T04:53:53Z) - Structure-Guided Input Graph for GNNs facing Heterophily [31.368672838207022]
We create a new graph in which nodes are connected if they share structural characteristics, meaning a higher chance of sharing their labels.<n>Experiments show that the labels are smoother in this newly defined graph and that the performance of GNN architectures improves when using this alternative structure.
arXiv Detail & Related papers (2024-12-02T17:52:33Z) - Noise-Resilient Unsupervised Graph Representation Learning via Multi-Hop Feature Quality Estimation [53.91958614666386]
Unsupervised graph representation learning (UGRL) based on graph neural networks (GNNs)
We propose a novel UGRL method based on Multi-hop feature Quality Estimation (MQE)
arXiv Detail & Related papers (2024-07-29T12:24:28Z) - Hyperbolic Benchmarking Unveils Network Topology-Feature Relationship in GNN Performance [0.5416466085090772]
We introduce a comprehensive benchmarking framework for graph machine learning.
We generate synthetic networks with realistic topological properties and node feature vectors.
Results highlight the dependency of model performance on the interplay between network structure and node features.
arXiv Detail & Related papers (2024-06-04T20:40:06Z) - Global Minima, Recoverability Thresholds, and Higher-Order Structure in
GNNS [0.0]
We analyze the performance of graph neural network (GNN) architectures from the perspective of random graph theory.
We show how both specific higher-order structures in synthetic data and the mix of empirical structures in real data have dramatic effects on GNN performance.
arXiv Detail & Related papers (2023-10-11T17:16:33Z) - Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis [50.972595036856035]
We present a code that successfully replicates results from six popular and recent graph recommendation models.
We compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations.
By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure.
arXiv Detail & Related papers (2023-08-01T09:31:44Z) - Deep Graph Neural Networks via Posteriori-Sampling-based Node-Adaptive Residual Module [65.81781176362848]
Graph Neural Networks (GNNs) can learn from graph-structured data through neighborhood information aggregation.
As the number of layers increases, node representations become indistinguishable, which is known as over-smoothing.
We propose a textbfPosterior-Sampling-based, Node-distinguish Residual module (PSNR).
arXiv Detail & Related papers (2023-05-09T12:03:42Z) - Exploiting Neighbor Effect: Conv-Agnostic GNNs Framework for Graphs with
Heterophily [58.76759997223951]
We propose a new metric based on von Neumann entropy to re-examine the heterophily problem of GNNs.
We also propose a Conv-Agnostic GNN framework (CAGNNs) to enhance the performance of most GNNs on heterophily datasets.
arXiv Detail & Related papers (2022-03-19T14:26:43Z) - Simplifying approach to Node Classification in Graph Neural Networks [7.057970273958933]
We decouple the node feature aggregation step and depth of graph neural network, and empirically analyze how different aggregated features play a role in prediction performance.
We show that not all features generated via aggregation steps are useful, and often using these less informative features can be detrimental to the performance of the GNN model.
We present a simple and shallow model, Feature Selection Graph Neural Network (FSGNN), and show empirically that the proposed model achieves comparable or even higher accuracy than state-of-the-art GNN models.
arXiv Detail & Related papers (2021-11-12T14:53:22Z) - Local Augmentation for Graph Neural Networks [78.48812244668017]
We introduce the local augmentation, which enhances node features by its local subgraph structures.
Based on the local augmentation, we further design a novel framework: LA-GNN, which can apply to any GNN models in a plug-and-play manner.
arXiv Detail & Related papers (2021-09-08T18:10:08Z) - Node Similarity Preserving Graph Convolutional Networks [51.520749924844054]
Graph Neural Networks (GNNs) explore the graph structure and node features by aggregating and transforming information within node neighborhoods.
We propose SimP-GCN that can effectively and efficiently preserve node similarity while exploiting graph structure.
We validate the effectiveness of SimP-GCN on seven benchmark datasets including three assortative and four disassorative graphs.
arXiv Detail & Related papers (2020-11-19T04:18:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.