Resurrecting Label Propagation for Graphs with Heterophily and Label Noise
- URL: http://arxiv.org/abs/2310.16560v2
- Date: Wed, 12 Jun 2024 05:05:02 GMT
- Title: Resurrecting Label Propagation for Graphs with Heterophily and Label Noise
- Authors: Yao Cheng, Caihua Shan, Yifei Shen, Xiang Li, Siqiang Luo, Dongsheng Li,
- Abstract summary: Label noise is a common challenge in large datasets, as it can significantly degrade the generalization ability of deep neural networks.
We study graph label noise in the context of arbitrary heterophily, with the aim of rectifying noisy labels and assigning labels to previously unlabeled nodes.
$R2LP$ is an iterative algorithm with three steps: (1) reconstruct the graph to recover the homophily property, (2) utilize label propagation to rectify the noisy labels, and (3) select high-confidence labels to retain for the next iteration.
- Score: 40.11022005996222
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Label noise is a common challenge in large datasets, as it can significantly degrade the generalization ability of deep neural networks. Most existing studies focus on noisy labels in computer vision; however, graph models encompass both node features and graph topology as input, and become more susceptible to label noise through message-passing mechanisms. Recently, only a few works have been proposed to tackle the label noise on graphs. One significant limitation is that they operate under the assumption that the graph exhibits homophily and that the labels are distributed smoothly. However, real-world graphs can exhibit varying degrees of heterophily, or even be dominated by heterophily, which results in the inadequacy of the current methods. In this paper, we study graph label noise in the context of arbitrary heterophily, with the aim of rectifying noisy labels and assigning labels to previously unlabeled nodes. We begin by conducting two empirical analyses to explore the impact of graph homophily on graph label noise. Following observations, we propose a efficient algorithm, denoted as $R^{2}LP$. Specifically, $R^{2}LP$ is an iterative algorithm with three steps: (1) reconstruct the graph to recover the homophily property, (2) utilize label propagation to rectify the noisy labels, (3) select high-confidence labels to retain for the next iteration. By iterating these steps, we obtain a set of correct labels, ultimately achieving high accuracy in the node classification task. The theoretical analysis is also provided to demonstrate its remarkable denoising effect. Finally, we perform experiments on ten benchmark datasets with different levels of graph heterophily and various types of noise. In these experiments, we compare the performance of $R^{2}LP$ against ten typical baseline methods. Our results illustrate the superior performance of the proposed $R^{2}LP$.
Related papers
- Mitigating Label Noise on Graph via Topological Sample Selection [72.86862597508077]
We propose a $textitTopological Sample Selection$ (TSS) method that boosts the informative sample selection process in a graph by utilising topological information.
We theoretically prove that our procedure minimizes an upper bound of the expected risk under target clean distribution, and experimentally show the superiority of our method compared with state-of-the-art baselines.
arXiv Detail & Related papers (2024-03-04T11:24:51Z) - ERASE: Error-Resilient Representation Learning on Graphs for Label Noise
Tolerance [53.73316938815873]
We propose a method called ERASE (Error-Resilient representation learning on graphs for lAbel noiSe tolerancE) to learn representations with error tolerance.
ERASE combines prototype pseudo-labels with propagated denoised labels and updates representations with error resilience.
Our method can outperform multiple baselines with clear margins in broad noise levels and enjoy great scalability.
arXiv Detail & Related papers (2023-12-13T17:59:07Z) - Resist Label Noise with PGM for Graph Neural Networks [4.566850249315913]
We propose a novel graphical probabilistic model (PGM) based framework LNP.
Given a noisy label set and a clean label set, our goal is to maximize the likelihood of labels in the clean set.
We show that LNP can lead to inspiring performance in high noise-rate situations.
arXiv Detail & Related papers (2023-11-03T02:47:06Z) - Local Graph Clustering with Noisy Labels [8.142265733890918]
We propose a study of local graph clustering using noisy node labels as a proxy for additional node information.
In this setting, nodes receive initial binary labels based on cluster affiliation: 1 if they belong to the target cluster and 0 otherwise.
We show that reliable node labels can be obtained with just a few samples from an attributed graph.
arXiv Detail & Related papers (2023-10-12T04:37:15Z) - NP$^2$L: Negative Pseudo Partial Labels Extraction for Graph Neural
Networks [48.39834063008816]
Pseudo labels are used in graph neural networks (GNNs) to assist learning at the message-passing level.
In this paper, we introduce a new method to use pseudo labels in GNNs.
We show that our method is more accurate if they are selected by not overlapping partial labels and defined as negative node pairs relations.
arXiv Detail & Related papers (2023-10-02T11:13:59Z) - Learning on Graphs under Label Noise [5.909452203428086]
We develop a novel approach dubbed Consistent Graph Neural Network (CGNN) to solve the problem of learning on graphs with label noise.
Specifically, we employ graph contrastive learning as a regularization term, which promotes two views of augmented nodes to have consistent representations.
To detect noisy labels on the graph, we present a sample selection technique based on the homophily assumption.
arXiv Detail & Related papers (2023-06-14T01:38:01Z) - Informative Pseudo-Labeling for Graph Neural Networks with Few Labels [12.83841767562179]
Graph Neural Networks (GNNs) have achieved state-of-the-art results for semi-supervised node classification on graphs.
The challenge of how to effectively learn GNNs with very few labels is still under-explored.
We propose a novel informative pseudo-labeling framework, called InfoGNN, to facilitate learning of GNNs with extremely few labels.
arXiv Detail & Related papers (2022-01-20T01:49:30Z) - Label-Wise Message Passing Graph Neural Network on Heterophilic Graphs [20.470934944907608]
We investigate a novel framework that performs well on graphs with either homophily or heterophily.
In label-wise message-passing, neighbors with similar pseudo labels will be aggregated together.
We also propose a bi-level optimization method to automatically select the model for graphs with homophily/heterophily.
arXiv Detail & Related papers (2021-10-15T14:49:45Z) - Line Graph Neural Networks for Link Prediction [71.00689542259052]
We consider the graph link prediction task, which is a classic graph analytical problem with many real-world applications.
In this formalism, a link prediction problem is converted to a graph classification task.
We propose to seek a radically different and novel path by making use of the line graphs in graph theory.
In particular, each node in a line graph corresponds to a unique edge in the original graph. Therefore, link prediction problems in the original graph can be equivalently solved as a node classification problem in its corresponding line graph, instead of a graph classification task.
arXiv Detail & Related papers (2020-10-20T05:54:31Z) - Class2Simi: A Noise Reduction Perspective on Learning with Noisy Labels [98.13491369929798]
We propose a framework called Class2Simi, which transforms data points with noisy class labels to data pairs with noisy similarity labels.
Class2Simi is computationally efficient because not only this transformation is on-the-fly in mini-batches, but also it just changes loss on top of model prediction into a pairwise manner.
arXiv Detail & Related papers (2020-06-14T07:55:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.