IceBerg: Debiased Self-Training for Class-Imbalanced Node Classification
- URL: http://arxiv.org/abs/2502.06280v1
- Date: Mon, 10 Feb 2025 09:23:22 GMT
- Title: IceBerg: Debiased Self-Training for Class-Imbalanced Node Classification
- Authors: Zhixun Li, Dingshuo Chen, Tong Zhao, Daixin Wang, Hongrui Liu, Zhiqiang Zhang, Jun Zhou, Jeffrey Xu Yu,
- Abstract summary: We propose IceBerg, a debiased self-training framework to address the class-imbalanced and few-shot challenges for GNNs.
Specifically, to figure out the Matthew effect and label distribution shift in self-training, we propose Double Balancing.
We find that leveraging unlabeled nodes can significantly enhance the performance of GNNs in class-imbalanced and few-shot scenarios.
- Score: 21.321928330397277
- License:
- Abstract: Graph Neural Networks (GNNs) have achieved great success in dealing with non-Euclidean graph-structured data and have been widely deployed in many real-world applications. However, their effectiveness is often jeopardized under class-imbalanced training sets. Most existing studies have analyzed class-imbalanced node classification from a supervised learning perspective, but they do not fully utilize the large number of unlabeled nodes in semi-supervised scenarios. We claim that the supervised signal is just the tip of the iceberg and a large number of unlabeled nodes have not yet been effectively utilized. In this work, we propose IceBerg, a debiased self-training framework to address the class-imbalanced and few-shot challenges for GNNs at the same time. Specifically, to figure out the Matthew effect and label distribution shift in self-training, we propose Double Balancing, which can largely improve the performance of existing baselines with just a few lines of code as a simple plug-and-play module. Secondly, to enhance the long-range propagation capability of GNNs, we disentangle the propagation and transformation operations of GNNs. Therefore, the weak supervision signals can propagate more effectively to address the few-shot issue. In summary, we find that leveraging unlabeled nodes can significantly enhance the performance of GNNs in class-imbalanced and few-shot scenarios, and even small, surgical modifications can lead to substantial performance improvements. Systematic experiments on benchmark datasets show that our method can deliver considerable performance gain over existing class-imbalanced node classification baselines. Additionally, due to IceBerg's outstanding ability to leverage unsupervised signals, it also achieves state-of-the-art results in few-shot node classification scenarios. The code of IceBerg is available at: https://github.com/ZhixunLEE/IceBerg.
Related papers
- GNN-MultiFix: Addressing the pitfalls for GNNs for multi-label node classification [1.857645719601748]
Graph neural networks (GNNs) have emerged as powerful models for learning representations of graph data.
We show that even the most expressive GNN may fail to learn in absence of node attributes and without using explicit label information as input.
We propose a straightforward approach, referred to as GNN-MultiFix, that integrates the feature, label, and positional information of a node.
arXiv Detail & Related papers (2024-11-21T12:59:39Z) - Graph Mining under Data scarcity [6.229055041065048]
We propose an Uncertainty Estimator framework that can be applied on top of any generic Graph Neural Networks (GNNs)
We train these models under the classic episodic learning paradigm in the $n$-way, $k$-shot fashion, in an end-to-end setting.
Our method outperforms the baselines, which demonstrates the efficacy of the Uncertainty Estimator for Few-shot node classification on graphs with a GNN.
arXiv Detail & Related papers (2024-06-07T10:50:03Z) - E2GNN: Efficient Graph Neural Network Ensembles for Semi-Supervised Classification [30.55931541782854]
This work studies ensemble learning for graph neural networks (GNNs) under the popular semi-supervised setting.
We propose an efficient ensemble learner--E2GNN to assemble multiple GNNs in a learnable way by leveraging both labeled and unlabeled nodes.
Comprehensive experiments over both transductive and inductive settings, across different GNN backbones and 8 benchmark datasets, demonstrate the superiority of E2GNN.
arXiv Detail & Related papers (2024-05-06T12:11:46Z) - Class-Balanced and Reinforced Active Learning on Graphs [13.239043161351482]
Graph neural networks (GNNs) have demonstrated significant success in various applications, such as node classification, link prediction, and graph classification.
Active learning for GNNs aims to query the valuable samples from the unlabeled data for annotation to maximize the GNNs' performance at a lower cost.
Most existing algorithms for reinforced active learning in GNNs may lead to a highly imbalanced class distribution, especially in highly skewed class scenarios.
We propose a novel class-balanced and reinforced active learning framework for GNNs, namely, GCBR. It learns an optimal policy to acquire class-balanced and informative nodes
arXiv Detail & Related papers (2024-02-15T16:37:14Z) - Classifying Nodes in Graphs without GNNs [50.311528896010785]
We propose a fully GNN-free approach for node classification, not requiring them at train or test time.
Our method consists of three key components: smoothness constraints, pseudo-labeling iterations and neighborhood-label histograms.
arXiv Detail & Related papers (2024-02-08T18:59:30Z) - GNNEvaluator: Evaluating GNN Performance On Unseen Graphs Without Labels [81.93520935479984]
We study a new problem, GNN model evaluation, that aims to assess the performance of a specific GNN model trained on labeled and observed graphs.
We propose a two-stage GNN model evaluation framework, including (1) DiscGraph set construction and (2) GNNEvaluator training and inference.
Under the effective training supervision from the DiscGraph set, GNNEvaluator learns to precisely estimate node classification accuracy of the to-be-evaluated GNN model.
arXiv Detail & Related papers (2023-10-23T05:51:59Z) - Label Deconvolution for Node Representation Learning on Large-scale
Attributed Graphs against Learning Bias [75.44877675117749]
We propose an efficient label regularization technique, namely Label Deconvolution (LD), to alleviate the learning bias by a novel and highly scalable approximation to the inverse mapping of GNNs.
Experiments demonstrate LD significantly outperforms state-of-the-art methods on Open Graph datasets Benchmark.
arXiv Detail & Related papers (2023-09-26T13:09:43Z) - Graph Ladling: Shockingly Simple Parallel GNN Training without
Intermediate Communication [100.51884192970499]
GNNs are a powerful family of neural networks for learning over graphs.
scaling GNNs either by deepening or widening suffers from prevalent issues of unhealthy gradients, over-smoothening, information squashing.
We propose not to deepen or widen current GNNs, but instead present a data-centric perspective of model soups tailored for GNNs.
arXiv Detail & Related papers (2023-06-18T03:33:46Z) - Exploiting Neighbor Effect: Conv-Agnostic GNNs Framework for Graphs with
Heterophily [58.76759997223951]
We propose a new metric based on von Neumann entropy to re-examine the heterophily problem of GNNs.
We also propose a Conv-Agnostic GNN framework (CAGNNs) to enhance the performance of most GNNs on heterophily datasets.
arXiv Detail & Related papers (2022-03-19T14:26:43Z) - Distance-wise Prototypical Graph Neural Network in Node Imbalance
Classification [9.755229198654922]
We propose a novel Distance-wise Prototypical Graph Neural Network (DPGNN) for imbalanced graph data.
The proposed DPGNN almost always significantly outperforms all other baselines, which demonstrates its effectiveness in imbalanced node classification.
arXiv Detail & Related papers (2021-10-22T19:43:15Z) - Towards Deeper Graph Neural Networks with Differentiable Group
Normalization [61.20639338417576]
Graph neural networks (GNNs) learn the representation of a node by aggregating its neighbors.
Over-smoothing is one of the key issues which limit the performance of GNNs as the number of layers increases.
We introduce two over-smoothing metrics and a novel technique, i.e., differentiable group normalization (DGN)
arXiv Detail & Related papers (2020-06-12T07:18:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.