Related papers: UNREAL:Unlabeled Nodes Retrieval and Labeling for Heavily-imbalanced Node Classification

UNREAL:Unlabeled Nodes Retrieval and Labeling for Heavily-imbalanced Node Classification

URL: http://arxiv.org/abs/2303.10371v1
Date: Sat, 18 Mar 2023 09:23:13 GMT
Title: UNREAL:Unlabeled Nodes Retrieval and Labeling for Heavily-imbalanced Node Classification
Authors: Liang Yan, Shengzhong Zhang, Bisheng Li, Min Zhou, Zengfeng Huang
Abstract summary: skewed label distributions are common in real-world node classification tasks. In this paper, we propose UNREAL, an iterative over-sampling method.
Score: 17.23736166919287
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Extremely skewed label distributions are common in real-world node classification tasks. If not dealt with appropriately, it significantly hurts the performance of GNNs in minority classes. Due to its practical importance, there have been a series of recent research devoted to this challenge. Existing over-sampling techniques smooth the label distribution by generating ``fake'' minority nodes and synthesizing their features and local topology, which largely ignore the rich information of unlabeled nodes on graphs. In this paper, we propose UNREAL, an iterative over-sampling method. The first key difference is that we only add unlabeled nodes instead of synthetic nodes, which eliminates the challenge of feature and neighborhood generation. To select which unlabeled nodes to add, we propose geometric ranking to rank unlabeled nodes. Geometric ranking exploits unsupervised learning in the node embedding space to effectively calibrates pseudo-label assignment. Finally, we identify the issue of geometric imbalance in the embedding space and provide a simple metric to filter out geometrically imbalanced nodes. Extensive experiments on real-world benchmark datasets are conducted, and the empirical results show that our method significantly outperforms current state-of-the-art methods consistent on different datasets with different imbalance ratios.

Related papers

Open-World Semi-Supervised Learning for Node Classification [53.07866559269709]
Open-world semi-supervised learning (Open-world SSL) for node classification is a practical but under-explored problem in the graph community. We propose an IMbalance-Aware method named OpenIMA for Open-world semi-supervised node classification.
arXiv Detail & Related papers (2024-03-18T05:12:54Z)
All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning. We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z)
Towards Label Position Bias in Graph Neural Networks [47.39692033598877]
Graph Neural Networks (GNNs) have emerged as a powerful tool for semi-supervised node classification tasks. Recent studies have revealed various biases in GNNs stemming from both node features and graph topology. In this work, we uncover a new bias - label position bias, which indicates that the node closer to the labeled nodes tends to perform better.
arXiv Detail & Related papers (2023-05-25T08:06:42Z)
GraphSR: A Data Augmentation Algorithm for Imbalanced Node Classification [10.03027886793368]
Graph neural networks (GNNs) have achieved great success in node classification tasks. Existing GNNs naturally bias towards the majority classes with more labelled data and ignore those minority classes with relatively few labelled ones. We propose textitGraphSR, a novel self-training strategy to augment the minority classes with significant diversity of unlabelled nodes.
arXiv Detail & Related papers (2023-02-24T18:49:10Z)
Pseudo Contrastive Learning for Graph-based Semi-supervised Learning [67.37572762925836]
Pseudo Labeling is a technique used to improve the performance of Graph Neural Networks (GNNs) We propose a general framework for GNNs, termed Pseudo Contrastive Learning (PCL)
arXiv Detail & Related papers (2023-02-19T10:34:08Z)
TAM: Topology-Aware Margin Loss for Class-Imbalanced Node Classification [33.028354930416754]
We propose Topology-Aware Margin (TAM) to reflect local topology on the learning objective. Our method consistently exhibits superiority over the baselines on various node classification benchmark datasets.
arXiv Detail & Related papers (2022-06-26T16:29:36Z)
Synthetic Over-sampling for Imbalanced Node Classification with Graph Neural Networks [34.81248024048974]
Graph neural networks (GNNs) have achieved state-of-the-art performance for node classification. In many real-world scenarios, node classes are imbalanced, with some majority classes making up most parts of the graph. In this work, we seek to address this problem by generating pseudo instances of minority classes to balance the training data.
arXiv Detail & Related papers (2022-06-10T19:47:05Z)
Geometer: Graph Few-Shot Class-Incremental Learning via Prototype Representation [50.772432242082914]
Existing graph neural network based methods mainly focus on classifying unlabeled nodes within fixed classes with abundant labeling. In this paper, we focus on this challenging but practical graph few-shot class-incremental learning (GFSCIL) problem and propose a novel method called Geometer. Instead of replacing and retraining the fully connected neural network classifer, Geometer predicts the label of a node by finding the nearest class prototype.
arXiv Detail & Related papers (2022-05-27T13:02:07Z)
GraphSMOTE: Imbalanced Node Classification on Graphs with Graph Neural Networks [28.92347073786722]
Graph neural networks (GNNs) have achieved state-of-the-art performance of node classification. We propose a novel framework, GraphSMOTE, in which an embedding space is constructed to encode the similarity among the nodes. New samples are synthesize in this space to assure genuineness.
arXiv Detail & Related papers (2021-03-16T03:23:55Z)
Sequential Graph Convolutional Network for Active Learning [53.99104862192055]
We propose a novel pool-based Active Learning framework constructed on a sequential Graph Convolution Network (GCN) With a small number of randomly sampled images as seed labelled examples, we learn the parameters of the graph to distinguish labelled vs unlabelled nodes. We exploit these characteristics of GCN to select the unlabelled examples which are sufficiently different from labelled ones.
arXiv Detail & Related papers (2020-06-18T00:55:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.