Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and Benchmark
- URL: http://arxiv.org/abs/2506.12468v2
- Date: Tue, 17 Jun 2025 03:17:11 GMT
- Title: Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and Benchmark
- Authors: Suyeon Kim, SeongKu Kang, Dongwoo Kim, Jungseul Ok, Hwanjo Yu,
- Abstract summary: Graph Neural Networks (GNNs) have achieved state-of-the-art performance in node classification tasks but struggle with label noise in real-world data.<n>We introduce BeGIN, a new benchmark that provides realistic graph datasets with various noise types.<n>By comprehensively evaluating noise-handling strategies, BeGIN provides insights into their effectiveness, efficiency, and key performance factors.
- Score: 25.40550479788717
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graph Neural Networks (GNNs) have achieved state-of-the-art performance in node classification tasks but struggle with label noise in real-world data. Existing studies on graph learning with label noise commonly rely on class-dependent label noise, overlooking the complexities of instance-dependent noise and falling short of capturing real-world corruption patterns. We introduce BeGIN (Benchmarking for Graphs with Instance-dependent Noise), a new benchmark that provides realistic graph datasets with various noise types and comprehensively evaluates noise-handling strategies across GNN architectures, noisy label detection, and noise-robust learning. To simulate instance-dependent corruptions, BeGIN introduces algorithmic methods and LLM-based simulations. Our experiments reveal the challenges of instance-dependent noise, particularly LLM-based corruption, and underscore the importance of node-specific parameterization to enhance GNN robustness. By comprehensively evaluating noise-handling strategies, BeGIN provides insights into their effectiveness, efficiency, and key performance factors. We expect that BeGIN will serve as a valuable resource for advancing research on label noise in graphs and fostering the development of robust GNN training methods. The code is available at https://github.com/kimsu55/BeGIN.
Related papers
- Learn Beneficial Noise as Graph Augmentation [54.44813218411879]
We propose PiNGDA, where positive-incentive noise (pi-noise) scientifically analyzes the beneficial effect of noise under the information theory.<n>We prove that the standard GCL with pre-defined augmentations is equivalent to estimate the beneficial noise via the point estimation.<n>Since the generator learns how to produce beneficial perturbations on graph topology and node attributes, PiNGDA is more reliable compared with the existing methods.
arXiv Detail & Related papers (2025-05-25T08:20:34Z) - Training Robust Graph Neural Networks by Modeling Noise Dependencies [28.1151026795484]
In real-world applications, node features in graphs often contain noise from various sources, leading to significant performance degradation.<n>We introduce a more realistic noise scenario, dependency-aware noise on graphs (DANG), where noise in node features create a chain of noise dependencies that propagates to the graph structure and node labels.<n>We propose a novel robust GNN, DA-GNN, which captures the causal relationships among variables in the data generating process (DGP) of DANG using variational inference.
arXiv Detail & Related papers (2025-02-27T01:30:13Z) - NoisyGL: A Comprehensive Benchmark for Graph Neural Networks under Label Noise [21.65452861777135]
Graph Neural Networks (GNNs) exhibit strong potential in node classification task through a message-passing mechanism.
Label noise is common in real-world graph data, negatively impacting GNNs by propagating incorrect information during training.
We introduce NoisyGL, the first comprehensive benchmark for graph neural networks under label noise.
arXiv Detail & Related papers (2024-06-06T17:45:00Z) - ROG$_{PL}$: Robust Open-Set Graph Learning via Region-Based Prototype
Learning [52.60434474638983]
We propose a unified framework named ROG$_PL$ to achieve robust open-set learning on complex noisy graph data.
The framework consists of two modules, i.e., denoising via label propagation and open-set prototype learning via regions.
To the best of our knowledge, the proposed ROG$_PL$ is the first robust open-set node classification method for graph data with complex noise.
arXiv Detail & Related papers (2024-02-28T17:25:06Z) - ERASE: Error-Resilient Representation Learning on Graphs for Label Noise
Tolerance [53.73316938815873]
We propose a method called ERASE (Error-Resilient representation learning on graphs for lAbel noiSe tolerancE) to learn representations with error tolerance.
ERASE combines prototype pseudo-labels with propagated denoised labels and updates representations with error resilience.
Our method can outperform multiple baselines with clear margins in broad noise levels and enjoy great scalability.
arXiv Detail & Related papers (2023-12-13T17:59:07Z) - Combating Bilateral Edge Noise for Robust Link Prediction [56.43882298843564]
We propose an information-theory-guided principle, Robust Graph Information Bottleneck (RGIB), to extract reliable supervision signals and avoid representation collapse.
Two instantiations, RGIB-SSL and RGIB-REP, are explored to leverage the merits of different methodologies.
Experiments on six datasets and three GNNs with diverse noisy scenarios verify the effectiveness of our RGIB instantiations.
arXiv Detail & Related papers (2023-11-02T12:47:49Z) - Feature Noise Boosts DNN Generalization under Label Noise [65.36889005555669]
The presence of label noise in the training data has a profound impact on the generalization of deep neural networks (DNNs)
In this study, we introduce and theoretically demonstrate a simple feature noise method, which directly adds noise to the features of training data.
arXiv Detail & Related papers (2023-08-03T08:31:31Z) - Robust Training of Graph Neural Networks via Noise Governance [27.767913371777247]
Graph Neural Networks (GNNs) have become widely-used models for semi-supervised learning.
In this paper, we consider an important yet challenging scenario where labels on nodes of graphs are not only noisy but also scarce.
We propose a novel RTGNN framework that achieves better robustness by learning to explicitly govern label noise.
arXiv Detail & Related papers (2022-11-12T09:25:32Z) - NRGNN: Learning a Label Noise-Resistant Graph Neural Network on Sparsely
and Noisily Labeled Graphs [20.470934944907608]
Graph Neural Networks (GNNs) have achieved promising results for semi-supervised learning tasks on graphs such as node classification.
Many real-world graphs are often sparsely and noisily labeled, which could significantly degrade the performance of GNNs.
We propose to develop a label noise-resistant GNN for semi-supervised node classification.
arXiv Detail & Related papers (2021-06-08T22:12:44Z) - Unified Robust Training for Graph NeuralNetworks against Label Noise [12.014301020294154]
We propose a new framework, UnionNET, for learning with noisy labels on graphs under a semi-supervised setting.
Our approach provides a unified solution for robustly training GNNs and performing label correction simultaneously.
arXiv Detail & Related papers (2021-03-05T01:17:04Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.