Mitigating Label Noise on Graph via Topological Sample Selection
- URL: http://arxiv.org/abs/2403.01942v4
- Date: Thu, 29 Aug 2024 05:48:42 GMT
- Title: Mitigating Label Noise on Graph via Topological Sample Selection
- Authors: Yuhao Wu, Jiangchao Yao, Xiaobo Xia, Jun Yu, Ruxin Wang, Bo Han, Tongliang Liu,
- Abstract summary: We propose a $textitTopological Sample Selection$ (TSS) method that boosts the informative sample selection process in a graph by utilising topological information.
We theoretically prove that our procedure minimizes an upper bound of the expected risk under target clean distribution, and experimentally show the superiority of our method compared with state-of-the-art baselines.
- Score: 72.86862597508077
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the success of the carefully-annotated benchmarks, the effectiveness of existing graph neural networks (GNNs) can be considerably impaired in practice when the real-world graph data is noisily labeled. Previous explorations in sample selection have been demonstrated as an effective way for robust learning with noisy labels, however, the conventional studies focus on i.i.d data, and when moving to non-iid graph data and GNNs, two notable challenges remain: (1) nodes located near topological class boundaries are very informative for classification but cannot be successfully distinguished by the heuristic sample selection. (2) there is no available measure that considers the graph topological information to promote sample selection in a graph. To address this dilemma, we propose a $\textit{Topological Sample Selection}$ (TSS) method that boosts the informative sample selection process in a graph by utilising topological information. We theoretically prove that our procedure minimizes an upper bound of the expected risk under target clean distribution, and experimentally show the superiority of our method compared with state-of-the-art baselines.
Related papers
- Active Learning for Graphs with Noisy Structures [29.760935499506804]
Graph Neural Networks (GNNs) have seen significant success in tasks such as node classification, largely contingent upon the availability of sufficient labeled nodes.
Yet, the excessive cost of labeling large-scale graphs led to a focus on active learning on graphs, which aims for effective data selection to maximize downstream model performance.
We propose an active learning framework, GALClean, which has been specifically designed to adopt an iterative approach for conducting both data selection and graph purification simultaneously with best information learned from the prior iteration.
arXiv Detail & Related papers (2024-02-04T02:23:45Z) - Graph Out-of-Distribution Generalization with Controllable Data
Augmentation [51.17476258673232]
Graph Neural Network (GNN) has demonstrated extraordinary performance in classifying graph properties.
Due to the selection bias of training and testing data, distribution deviation is widespread.
We propose OOD calibration to measure the distribution deviation of virtual samples.
arXiv Detail & Related papers (2023-08-16T13:10:27Z) - Learning on Graphs under Label Noise [5.909452203428086]
We develop a novel approach dubbed Consistent Graph Neural Network (CGNN) to solve the problem of learning on graphs with label noise.
Specifically, we employ graph contrastive learning as a regularization term, which promotes two views of augmented nodes to have consistent representations.
To detect noisy labels on the graph, we present a sample selection technique based on the homophily assumption.
arXiv Detail & Related papers (2023-06-14T01:38:01Z) - Similarity-aware Positive Instance Sampling for Graph Contrastive
Pre-training [82.68805025636165]
We propose to select positive graph instances directly from existing graphs in the training set.
Our selection is based on certain domain-specific pair-wise similarity measurements.
Besides, we develop an adaptive node-level pre-training method to dynamically mask nodes to distribute them evenly in the graph.
arXiv Detail & Related papers (2022-06-23T20:12:51Z) - Optimal Propagation for Graph Neural Networks [51.08426265813481]
We propose a bi-level optimization approach for learning the optimal graph structure.
We also explore a low-rank approximation model for further reducing the time complexity.
arXiv Detail & Related papers (2022-05-06T03:37:00Z) - Heterogeneous Graph Neural Networks using Self-supervised Reciprocally
Contrastive Learning [102.9138736545956]
Heterogeneous graph neural network (HGNN) is a very popular technique for the modeling and analysis of heterogeneous graphs.
We develop for the first time a novel and robust heterogeneous graph contrastive learning approach, namely HGCL, which introduces two views on respective guidance of node attributes and graph topologies.
In this new approach, we adopt distinct but most suitable attribute and topology fusion mechanisms in the two views, which are conducive to mining relevant information in attributes and topologies separately.
arXiv Detail & Related papers (2022-04-30T12:57:02Z) - Issues with Propagation Based Models for Graph-Level Outlier Detection [16.980621769406916]
Graph-Level Outlier Detection ( GLOD) is the task of identifying unusual graphs within a graph database.
This paper identifies and delves into a fundamental and intriguing issue with applying propagation based models to GLOD.
We find that ROC-AUC performance of the models change significantly depending on which class is down-sampled.
arXiv Detail & Related papers (2020-12-23T19:38:21Z) - Handling Missing Data with Graph Representation Learning [62.59831675688714]
We propose GRAPE, a graph-based framework for feature imputation as well as label prediction.
Under GRAPE, the feature imputation is formulated as an edge-level prediction task and the label prediction as a node-level prediction task.
Experimental results on nine benchmark datasets show that GRAPE yields 20% lower mean absolute error for imputation tasks and 10% lower for label prediction tasks.
arXiv Detail & Related papers (2020-10-30T17:59:13Z) - Active Learning on Attributed Graphs via Graph Cognizant Logistic
Regression and Preemptive Query Generation [37.742218733235084]
We propose a novel graph-based active learning algorithm for the task of node classification in attributed graphs.
Our algorithm uses graph cognizant logistic regression, equivalent to a linearized graph convolutional neural network (GCN) for the prediction phase and maximizes the expected error reduction in the query phase.
We conduct experiments on five public benchmark datasets, demonstrating a significant improvement over state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-09T18:00:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.