When Noisy Labels Meet Class Imbalance on Graphs: A Graph Augmentation Method with LLM and Pseudo Label
- URL: http://arxiv.org/abs/2507.18153v2
- Date: Fri, 25 Jul 2025 04:04:58 GMT
- Title: When Noisy Labels Meet Class Imbalance on Graphs: A Graph Augmentation Method with LLM and Pseudo Label
- Authors: Riting Xia, Rucong Wang, Yulin Liu, Anchen Li, Xueyan Liu, Yan Zhang,
- Abstract summary: This paper systematically investigates robust node classification for class-imbalanced graphs with noisy labels.<n>We propose GraphALP, a novel Graph Augmentation framework based on Large language models (LLMs) and Pseudo-labeling techniques.<n> Experimental results show that GraphALP achieves superior performance over state-of-the-art methods on class-imbalanced graphs with noisy labels.
- Score: 3.667121386226796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Class-imbalanced graph node classification is a practical yet underexplored research problem. Although recent studies have attempted to address this issue, they typically assume clean and reliable labels when processing class-imbalanced graphs. This assumption often violates the nature of real-world graphs, where labels frequently contain noise. Given this gap, this paper systematically investigates robust node classification for class-imbalanced graphs with noisy labels. We propose GraphALP, a novel Graph Augmentation framework based on Large language models (LLMs) and Pseudo-labeling techniques. Specifically, we design an LLM-based oversampling method to generate synthetic minority nodes, producing label-accurate minority nodes to alleviate class imbalance. Based on the class-balanced graphs, we develop a dynamically weighted pseudo-labeling method to obtain high-confidence pseudo labels to reduce label noise ratio. Additionally, we implement a secondary LLM-guided oversampling mechanism to mitigate potential class distribution skew caused by pseudo labels. Experimental results show that GraphALP achieves superior performance over state-of-the-art methods on class-imbalanced graphs with noisy labels.
Related papers
- Robust Graph-Based Semi-Supervised Learning via $p$-Conductances [49.0776396776252]
We study the problem of semi-supervised learning on graphs in the regime where data labels are scarce or possibly corrupted.<n>We propose an approach called $p$-conductance learning that generalizes the $p$-Laplace and Poisson learning methods.<n> Empirical results on computer vision and citation datasets demonstrate that our approach achieves state-of-the-art accuracy in low label-rate, corrupted-label, and partial-label regimes.
arXiv Detail & Related papers (2025-02-13T01:11:25Z) - Inaccurate Label Distribution Learning with Dependency Noise [52.08553913094809]
We introduce the Dependent Noise-based Inaccurate Label Distribution Learning (DN-ILDL) framework to tackle the challenges posed by noise in label distribution learning.
We show that DN-ILDL effectively addresses the ILDL problem and outperforms existing LDL methods.
arXiv Detail & Related papers (2024-05-26T07:58:07Z) - Open-World Semi-Supervised Learning for Node Classification [53.07866559269709]
Open-world semi-supervised learning (Open-world SSL) for node classification is a practical but under-explored problem in the graph community.
We propose an IMbalance-Aware method named OpenIMA for Open-world semi-supervised node classification.
arXiv Detail & Related papers (2024-03-18T05:12:54Z) - ERASE: Error-Resilient Representation Learning on Graphs for Label Noise
Tolerance [53.73316938815873]
We propose a method called ERASE (Error-Resilient representation learning on graphs for lAbel noiSe tolerancE) to learn representations with error tolerance.
ERASE combines prototype pseudo-labels with propagated denoised labels and updates representations with error resilience.
Our method can outperform multiple baselines with clear margins in broad noise levels and enjoy great scalability.
arXiv Detail & Related papers (2023-12-13T17:59:07Z) - Resurrecting Label Propagation for Graphs with Heterophily and Label Noise [40.11022005996222]
Label noise is a common challenge in large datasets, as it can significantly degrade the generalization ability of deep neural networks.
We study graph label noise in the context of arbitrary heterophily, with the aim of rectifying noisy labels and assigning labels to previously unlabeled nodes.
$R2LP$ is an iterative algorithm with three steps: (1) reconstruct the graph to recover the homophily property, (2) utilize label propagation to rectify the noisy labels, and (3) select high-confidence labels to retain for the next iteration.
arXiv Detail & Related papers (2023-10-25T11:28:26Z) - Learning on Graphs under Label Noise [5.909452203428086]
We develop a novel approach dubbed Consistent Graph Neural Network (CGNN) to solve the problem of learning on graphs with label noise.
Specifically, we employ graph contrastive learning as a regularization term, which promotes two views of augmented nodes to have consistent representations.
To detect noisy labels on the graph, we present a sample selection technique based on the homophily assumption.
arXiv Detail & Related papers (2023-06-14T01:38:01Z) - Informative Pseudo-Labeling for Graph Neural Networks with Few Labels [12.83841767562179]
Graph Neural Networks (GNNs) have achieved state-of-the-art results for semi-supervised node classification on graphs.
The challenge of how to effectively learn GNNs with very few labels is still under-explored.
We propose a novel informative pseudo-labeling framework, called InfoGNN, to facilitate learning of GNNs with extremely few labels.
arXiv Detail & Related papers (2022-01-20T01:49:30Z) - Instance-dependent Label-noise Learning under a Structural Causal Model [92.76400590283448]
Label noise will degenerate the performance of deep learning algorithms.
By leveraging a structural causal model, we propose a novel generative approach for instance-dependent label-noise learning.
arXiv Detail & Related papers (2021-09-07T10:42:54Z) - Weakly-supervised Graph Meta-learning for Few-shot Node Classification [53.36828125138149]
We propose a new graph meta-learning framework -- Graph Hallucination Networks (Meta-GHN)
Based on a new robustness-enhanced episodic training, Meta-GHN is meta-learned to hallucinate clean node representations from weakly-labeled data.
Extensive experiments demonstrate the superiority of Meta-GHN over existing graph meta-learning studies.
arXiv Detail & Related papers (2021-06-12T22:22:10Z) - Unified Robust Training for Graph NeuralNetworks against Label Noise [12.014301020294154]
We propose a new framework, UnionNET, for learning with noisy labels on graphs under a semi-supervised setting.
Our approach provides a unified solution for robustly training GNNs and performing label correction simultaneously.
arXiv Detail & Related papers (2021-03-05T01:17:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.