Related papers: Synthetic Over-sampling for Imbalanced Node Classification with Graph Neural Networks

Synthetic Over-sampling for Imbalanced Node Classification with Graph Neural Networks

URL: http://arxiv.org/abs/2206.05335v1
Date: Fri, 10 Jun 2022 19:47:05 GMT
Title: Synthetic Over-sampling for Imbalanced Node Classification with Graph Neural Networks
Authors: Tianxiang Zhao and Xiang Zhang and Suhang Wang
Abstract summary: Graph neural networks (GNNs) have achieved state-of-the-art performance for node classification. In many real-world scenarios, node classes are imbalanced, with some majority classes making up most parts of the graph. In this work, we seek to address this problem by generating pseudo instances of minority classes to balance the training data.
Score: 34.81248024048974
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, graph neural networks (GNNs) have achieved state-of-the-art performance for node classification. However, most existing GNNs would suffer from the graph imbalance problem. In many real-world scenarios, node classes are imbalanced, with some majority classes making up most parts of the graph. The message propagation mechanism in GNNs would further amplify the dominance of those majority classes, resulting in sub-optimal classification performance. In this work, we seek to address this problem by generating pseudo instances of minority classes to balance the training data, extending previous over-sampling-based techniques. This task is non-trivial, as those techniques are designed with the assumption that instances are independent. Neglection of relation information would complicate this oversampling process. Furthermore, the node classification task typically takes the semi-supervised setting with only a few labeled nodes, providing insufficient supervision for the generation of minority instances. Generated new nodes of low quality would harm the trained classifier. In this work, we address these difficulties by synthesizing new nodes in a constructed embedding space, which encodes both node attributes and topology information. Furthermore, an edge generator is trained simultaneously to model the graph structure and provide relations for new samples. To further improve the data efficiency, we also explore synthesizing mixed ``in-between'' nodes to utilize nodes from the majority class in this over-sampling process. Experiments on real-world datasets validate the effectiveness of our proposed framework.

Related papers

Degree-based stratification of nodes in Graph Neural Networks [66.17149106033126]
We modify the Graph Neural Network (GNN) architecture so that the weight matrices are learned, separately, for the nodes in each group. This simple-to-implement modification seems to improve performance across datasets and GNN methods.
arXiv Detail & Related papers (2023-12-16T14:09:23Z)
NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification [70.51126383984555]
We introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes. The efficient computation is enabled by a kernerlized Gumbel-Softmax operator. Experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs.
arXiv Detail & Related papers (2023-06-14T09:21:15Z)
UNREAL:Unlabeled Nodes Retrieval and Labeling for Heavily-imbalanced Node Classification [17.23736166919287]
skewed label distributions are common in real-world node classification tasks. In this paper, we propose UNREAL, an iterative over-sampling method.
arXiv Detail & Related papers (2023-03-18T09:23:13Z)
Semantic-aware Node Synthesis for Imbalanced Heterogeneous Information Networks [51.55932524129814]
We present the first method for the semantic imbalance problem in imbalanced HINs named Semantic-aware Node Synthesis (SNS) SNS adaptively selects the heterogeneous neighbor nodes and augments the network with synthetic nodes while preserving the minority semantics. We also introduce two regularization approaches for HGNNs that constrain the representation of synthetic nodes from both semantic and class perspectives.
arXiv Detail & Related papers (2023-02-27T00:21:43Z)
GraphSR: A Data Augmentation Algorithm for Imbalanced Node Classification [10.03027886793368]
Graph neural networks (GNNs) have achieved great success in node classification tasks. Existing GNNs naturally bias towards the majority classes with more labelled data and ignore those minority classes with relatively few labelled ones. We propose textitGraphSR, a novel self-training strategy to augment the minority classes with significant diversity of unlabelled nodes.
arXiv Detail & Related papers (2023-02-24T18:49:10Z)
ResNorm: Tackling Long-tailed Degree Distribution Issue in Graph Neural Networks via Normalization [80.90206641975375]
This paper focuses on improving the performance of GNNs via normalization. By studying the long-tailed distribution of node degrees in the graph, we propose a novel normalization method for GNNs. The $scale$ operation of ResNorm reshapes the node-wise standard deviation (NStd) distribution so as to improve the accuracy of tail nodes.
arXiv Detail & Related papers (2022-06-16T13:49:09Z)
Mixed Graph Contrastive Network for Semi-Supervised Node Classification [63.924129159538076]
We propose a novel graph contrastive learning method, termed Mixed Graph Contrastive Network (MGCN) In our method, we improve the discriminative capability of the latent embeddings by an unperturbed augmentation strategy and a correlation reduction mechanism. By combining the two settings, we extract rich supervision information from both the abundant nodes and the rare yet valuable labeled nodes for discriminative representation learning.
arXiv Detail & Related papers (2022-06-06T14:26:34Z)
Exploiting Neighbor Effect: Conv-Agnostic GNNs Framework for Graphs with Heterophily [58.76759997223951]
We propose a new metric based on von Neumann entropy to re-examine the heterophily problem of GNNs. We also propose a Conv-Agnostic GNN framework (CAGNNs) to enhance the performance of most GNNs on heterophily datasets.
arXiv Detail & Related papers (2022-03-19T14:26:43Z)
Graph Neural Network with Curriculum Learning for Imbalanced Node Classification [21.085314408929058]
Graph Neural Network (GNN) is an emerging technique for graph-based learning tasks such as node classification. In this work, we reveal the vulnerability of GNN to the imbalance of node labels. We propose a novel graph neural network framework with curriculum learning (GNN-CL) consisting of two modules.
arXiv Detail & Related papers (2022-02-05T10:46:11Z)
GraphMixup: Improving Class-Imbalanced Node Classification on Graphs by Self-supervised Context Prediction [25.679620842010422]
This paper presents GraphMixup, a novel mixup-based framework for improving class-imbalanced node classification on graphs. We develop a emphReinforcement Mixup mechanism to adaptively determine how many samples are to be generated by mixup for those minority classes. Experiments on three real-world datasets show that GraphMixup yields truly encouraging results for class-imbalanced node classification tasks.
arXiv Detail & Related papers (2021-06-21T14:12:16Z)
GraphSMOTE: Imbalanced Node Classification on Graphs with Graph Neural Networks [28.92347073786722]
Graph neural networks (GNNs) have achieved state-of-the-art performance of node classification. We propose a novel framework, GraphSMOTE, in which an embedding space is constructed to encode the similarity among the nodes. New samples are synthesize in this space to assure genuineness.
arXiv Detail & Related papers (2021-03-16T03:23:55Z)
Towards Deeper Graph Neural Networks with Differentiable Group Normalization [61.20639338417576]
Graph neural networks (GNNs) learn the representation of a node by aggregating its neighbors. Over-smoothing is one of the key issues which limit the performance of GNNs as the number of layers increases. We introduce two over-smoothing metrics and a novel technique, i.e., differentiable group normalization (DGN)
arXiv Detail & Related papers (2020-06-12T07:18:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.