Pure Node Selection for Imbalanced Graph Node Classification
- URL: http://arxiv.org/abs/2509.23662v1
- Date: Sun, 28 Sep 2025 05:53:33 GMT
- Title: Pure Node Selection for Imbalanced Graph Node Classification
- Authors: Fanlong Zeng, Wensheng Gan, Jiayang Wu, Philip S. Yu,
- Abstract summary: Class imbalance is prevalent in graph-structured data and Graph neural networks (GNNs) are typically based on the assumption of class balance.<n>We propose PNS (Pure Node Sampling) to address the Randomness Anomalous Connectivity Problem (RACP)<n>Unlike existing approaches that design specialized algorithms to handle either quantity imbalance or topological imbalance, PNS is a novel plug-and-play module that operates directly during node synthesis.
- Score: 46.14551978150584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The problem of class imbalance refers to an uneven distribution of quantity among classes in a dataset, where some classes are significantly underrepresented compared to others. Class imbalance is also prevalent in graph-structured data. Graph neural networks (GNNs) are typically based on the assumption of class balance, often overlooking the issue of class imbalance. In our investigation, we identified a problem, which we term the Randomness Anomalous Connectivity Problem (RACP), where certain off-the-shelf models are affected by random seeds, leading to a significant performance degradation. To eliminate the influence of random factors in algorithms, we proposed PNS (Pure Node Sampling) to address the RACP in the node synthesis stage. Unlike existing approaches that design specialized algorithms to handle either quantity imbalance or topological imbalance, PNS is a novel plug-and-play module that operates directly during node synthesis to mitigate RACP. Moreover, PNS also alleviates performance degradation caused by abnormal distribution of node neighbors. We conduct a series of experiments to identify what factors are influenced by random seeds. Experimental results demonstrate the effectiveness and stability of our method, which not only eliminates the effect of unfavorable random seeds but also outperforms the baseline across various benchmark datasets with different GNN backbones. Data and code are available at https://github.com/flzeng1/PNS.
Related papers
- Alleviating Structural Distribution Shift in Graph Anomaly Detection [70.1022676681496]
Graph anomaly detection (GAD) is a challenging binary classification problem.
Gallon neural networks (GNNs) benefit the classification of normals from aggregating homophilous neighbors.
We propose a framework to mitigate the effect of heterophilous neighbors and make them invariant.
arXiv Detail & Related papers (2024-01-25T13:07:34Z) - Heterophily-Based Graph Neural Network for Imbalanced Classification [19.51668009720269]
We introduce a unique approach that tackles imbalanced classification on graphs by considering graph heterophily.
We propose Fast Im-GBK, which integrates an imbalance classification strategy with heterophily-aware GNNs.
Our experiments on real-world graphs demonstrate our model's superiority in classification performance and efficiency for node classification tasks.
arXiv Detail & Related papers (2023-10-12T21:19:47Z) - When Do Graph Neural Networks Help with Node Classification?
Investigating the Impact of Homophily Principle on Node Distinguishability [92.8279562472538]
Homophily principle has been believed to be the main reason for the performance superiority of Graph Networks (GNNs) over Neural Networks on node classification tasks.
Recent research suggests that, even in the absence of homophily, the advantage of GNNs still exists as long as nodes from the same class share similar neighborhood patterns.
arXiv Detail & Related papers (2023-04-25T09:40:47Z) - Imbalanced Nodes Classification for Graph Neural Networks Based on
Valuable Sample Mining [9.156427521259195]
A new loss function FD-Loss is reconstructed based on the traditional algorithm-level approach to the imbalance problem.
Our loss function can effectively solve the sample node imbalance problem and improve the classification accuracy by 4% compared to existing methods in the node classification task.
arXiv Detail & Related papers (2022-09-18T09:22:32Z) - Synthetic Over-sampling for Imbalanced Node Classification with Graph
Neural Networks [34.81248024048974]
Graph neural networks (GNNs) have achieved state-of-the-art performance for node classification.
In many real-world scenarios, node classes are imbalanced, with some majority classes making up most parts of the graph.
In this work, we seek to address this problem by generating pseudo instances of minority classes to balance the training data.
arXiv Detail & Related papers (2022-06-10T19:47:05Z) - On the Prediction Instability of Graph Neural Networks [2.3605348648054463]
Instability of trained models can affect reliability, reliability, and trust in machine learning systems.
We systematically assess the prediction instability of node classification with state-of-the-art Graph Neural Networks (GNNs)
We find that up to one third of the incorrectly classified nodes differ across algorithm runs.
arXiv Detail & Related papers (2022-05-20T10:32:59Z) - Shift-Robust GNNs: Overcoming the Limitations of Localized Graph
Training data [52.771780951404565]
Shift-Robust GNN (SR-GNN) is designed to account for distributional differences between biased training data and the graph's true inference distribution.
We show that SR-GNN outperforms other GNN baselines by accuracy, eliminating at least (40%) of the negative effects introduced by biased training data.
arXiv Detail & Related papers (2021-08-02T18:00:38Z) - Understanding and Resolving Performance Degradation in Graph
Convolutional Networks [105.14867349802898]
Graph Convolutional Network (GCN) stacks several layers and in each layer performs a PROPagation operation (PROP) and a TRANsformation operation (TRAN) for learning node representations over graph-structured data.
GCNs tend to suffer performance drop when the model gets deep.
We study performance degradation of GCNs by experimentally examining how stacking only TRANs or PROPs works.
arXiv Detail & Related papers (2020-06-12T12:12:12Z) - Stochastic Graph Neural Networks [123.39024384275054]
Graph neural networks (GNNs) model nonlinear representations in graph data with applications in distributed agent coordination, control, and planning.
Current GNN architectures assume ideal scenarios and ignore link fluctuations that occur due to environment, human factors, or external attacks.
In these situations, the GNN fails to address its distributed task if the topological randomness is not considered accordingly.
arXiv Detail & Related papers (2020-06-04T08:00:00Z) - The Effects of Randomness on the Stability of Node Embeddings [5.126380982787905]
We apply five node embeddings algorithms to graphs and evaluate their stability under randomness.
We find significant instabilities in the geometry of embedding spaces independent of the centrality of a node.
This suggests that instability effects need to be taken into account when working with node embeddings.
arXiv Detail & Related papers (2020-05-20T13:36:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.