UNREAL:Unlabeled Nodes Retrieval and Labeling for Heavily-imbalanced
Node Classification
- URL: http://arxiv.org/abs/2303.10371v1
- Date: Sat, 18 Mar 2023 09:23:13 GMT
- Title: UNREAL:Unlabeled Nodes Retrieval and Labeling for Heavily-imbalanced
Node Classification
- Authors: Liang Yan, Shengzhong Zhang, Bisheng Li, Min Zhou, Zengfeng Huang
- Abstract summary: skewed label distributions are common in real-world node classification tasks.
In this paper, we propose UNREAL, an iterative over-sampling method.
- Score: 17.23736166919287
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extremely skewed label distributions are common in real-world node
classification tasks. If not dealt with appropriately, it significantly hurts
the performance of GNNs in minority classes. Due to its practical importance,
there have been a series of recent research devoted to this challenge. Existing
over-sampling techniques smooth the label distribution by generating ``fake''
minority nodes and synthesizing their features and local topology, which
largely ignore the rich information of unlabeled nodes on graphs. In this
paper, we propose UNREAL, an iterative over-sampling method. The first key
difference is that we only add unlabeled nodes instead of synthetic nodes,
which eliminates the challenge of feature and neighborhood generation. To
select which unlabeled nodes to add, we propose geometric ranking to rank
unlabeled nodes. Geometric ranking exploits unsupervised learning in the node
embedding space to effectively calibrates pseudo-label assignment. Finally, we
identify the issue of geometric imbalance in the embedding space and provide a
simple metric to filter out geometrically imbalanced nodes. Extensive
experiments on real-world benchmark datasets are conducted, and the empirical
results show that our method significantly outperforms current state-of-the-art
methods consistent on different datasets with different imbalance ratios.
Related papers
- Open-World Semi-Supervised Learning for Node Classification [53.07866559269709]
Open-world semi-supervised learning (Open-world SSL) for node classification is a practical but under-explored problem in the graph community.
We propose an IMbalance-Aware method named OpenIMA for Open-world semi-supervised node classification.
arXiv Detail & Related papers (2024-03-18T05:12:54Z) - All Points Matter: Entropy-Regularized Distribution Alignment for
Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning.
We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z) - Towards Label Position Bias in Graph Neural Networks [47.39692033598877]
Graph Neural Networks (GNNs) have emerged as a powerful tool for semi-supervised node classification tasks.
Recent studies have revealed various biases in GNNs stemming from both node features and graph topology.
In this work, we uncover a new bias - label position bias, which indicates that the node closer to the labeled nodes tends to perform better.
arXiv Detail & Related papers (2023-05-25T08:06:42Z) - GraphSR: A Data Augmentation Algorithm for Imbalanced Node
Classification [10.03027886793368]
Graph neural networks (GNNs) have achieved great success in node classification tasks.
Existing GNNs naturally bias towards the majority classes with more labelled data and ignore those minority classes with relatively few labelled ones.
We propose textitGraphSR, a novel self-training strategy to augment the minority classes with significant diversity of unlabelled nodes.
arXiv Detail & Related papers (2023-02-24T18:49:10Z) - Pseudo Contrastive Learning for Graph-based Semi-supervised Learning [67.37572762925836]
Pseudo Labeling is a technique used to improve the performance of Graph Neural Networks (GNNs)
We propose a general framework for GNNs, termed Pseudo Contrastive Learning (PCL)
arXiv Detail & Related papers (2023-02-19T10:34:08Z) - TAM: Topology-Aware Margin Loss for Class-Imbalanced Node Classification [33.028354930416754]
We propose Topology-Aware Margin (TAM) to reflect local topology on the learning objective.
Our method consistently exhibits superiority over the baselines on various node classification benchmark datasets.
arXiv Detail & Related papers (2022-06-26T16:29:36Z) - Synthetic Over-sampling for Imbalanced Node Classification with Graph
Neural Networks [34.81248024048974]
Graph neural networks (GNNs) have achieved state-of-the-art performance for node classification.
In many real-world scenarios, node classes are imbalanced, with some majority classes making up most parts of the graph.
In this work, we seek to address this problem by generating pseudo instances of minority classes to balance the training data.
arXiv Detail & Related papers (2022-06-10T19:47:05Z) - Geometer: Graph Few-Shot Class-Incremental Learning via Prototype
Representation [50.772432242082914]
Existing graph neural network based methods mainly focus on classifying unlabeled nodes within fixed classes with abundant labeling.
In this paper, we focus on this challenging but practical graph few-shot class-incremental learning (GFSCIL) problem and propose a novel method called Geometer.
Instead of replacing and retraining the fully connected neural network classifer, Geometer predicts the label of a node by finding the nearest class prototype.
arXiv Detail & Related papers (2022-05-27T13:02:07Z) - GraphSMOTE: Imbalanced Node Classification on Graphs with Graph Neural
Networks [28.92347073786722]
Graph neural networks (GNNs) have achieved state-of-the-art performance of node classification.
We propose a novel framework, GraphSMOTE, in which an embedding space is constructed to encode the similarity among the nodes.
New samples are synthesize in this space to assure genuineness.
arXiv Detail & Related papers (2021-03-16T03:23:55Z) - Sequential Graph Convolutional Network for Active Learning [53.99104862192055]
We propose a novel pool-based Active Learning framework constructed on a sequential Graph Convolution Network (GCN)
With a small number of randomly sampled images as seed labelled examples, we learn the parameters of the graph to distinguish labelled vs unlabelled nodes.
We exploit these characteristics of GCN to select the unlabelled examples which are sufficiently different from labelled ones.
arXiv Detail & Related papers (2020-06-18T00:55:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.