Related papers: Local Graph Clustering with Noisy Labels

Local Graph Clustering with Noisy Labels

URL: http://arxiv.org/abs/2310.08031v2
Date: Mon, 4 Mar 2024 02:57:11 GMT
Title: Local Graph Clustering with Noisy Labels
Authors: Artur Back de Luca, Kimon Fountoulakis, Shenghao Yang
Abstract summary: We propose a study of local graph clustering using noisy node labels as a proxy for additional node information. In this setting, nodes receive initial binary labels based on cluster affiliation: 1 if they belong to the target cluster and 0 otherwise. We show that reliable node labels can be obtained with just a few samples from an attributed graph.
Score: 8.142265733890918
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The growing interest in machine learning problems over graphs with additional node information such as texts, images, or labels has popularized methods that require the costly operation of processing the entire graph. Yet, little effort has been made to the development of fast local methods (i.e. without accessing the entire graph) that extract useful information from such data. To that end, we propose a study of local graph clustering using noisy node labels as a proxy for additional node information. In this setting, nodes receive initial binary labels based on cluster affiliation: 1 if they belong to the target cluster and 0 otherwise. Subsequently, a fraction of these labels is flipped. We investigate the benefits of incorporating noisy labels for local graph clustering. By constructing a weighted graph with such labels, we study the performance of graph diffusion-based local clustering method on both the original and the weighted graphs. From a theoretical perspective, we consider recovering an unknown target cluster with a single seed node in a random graph with independent noisy node labels. We provide sufficient conditions on the label noise under which, with high probability, using diffusion in the weighted graph yields a more accurate recovery of the target cluster. This approach proves more effective than using the given labels alone or using diffusion in the label-free original graph. Empirically, we show that reliable node labels can be obtained with just a few samples from an attributed graph. Moreover, utilizing these labels via diffusion in the weighted graph leads to significantly better local clustering performance across several real-world datasets, improving F1 scores by up to 13%.

Related papers

Resurrecting Label Propagation for Graphs with Heterophily and Label Noise [40.11022005996222]
Label noise is a common challenge in large datasets, as it can significantly degrade the generalization ability of deep neural networks. We study graph label noise in the context of arbitrary heterophily, with the aim of rectifying noisy labels and assigning labels to previously unlabeled nodes. $R2LP$ is an iterative algorithm with three steps: (1) reconstruct the graph to recover the homophily property, (2) utilize label propagation to rectify the noisy labels, and (3) select high-confidence labels to retain for the next iteration.
arXiv Detail & Related papers (2023-10-25T11:28:26Z)
Learning on Graphs under Label Noise [5.909452203428086]
We develop a novel approach dubbed Consistent Graph Neural Network (CGNN) to solve the problem of learning on graphs with label noise. Specifically, we employ graph contrastive learning as a regularization term, which promotes two views of augmented nodes to have consistent representations. To detect noisy labels on the graph, we present a sample selection technique based on the homophily assumption.
arXiv Detail & Related papers (2023-06-14T01:38:01Z)
LiftPool: Lifting-based Graph Pooling for Hierarchical Graph Representation Learning [53.176603566951016]
We propose an enhanced three-stage method via lifting, named LiftPool, to improve hierarchical graph representation. For each node to be removed, its local information is obtained by subtracting the global information aggregated from its neighboring preserved nodes. Evaluations on benchmark graph datasets show that LiftPool substantially outperforms the state-of-the-art graph pooling methods in the task of graph classification.
arXiv Detail & Related papers (2022-04-27T12:38:02Z)
Graph Transplant: Node Saliency-Guided Graph Mixup with Local Structure Preservation [27.215800308343322]
We present the first Mixup-like graph augmentation method at the graph-level called Graph Transplant. Our method identifies the sub-structure as a mix unit that can preserve the local information. We extensively validate our method with diverse GNN architectures on multiple graph classification benchmark datasets.
arXiv Detail & Related papers (2021-11-10T11:10:13Z)
Label-Wise Message Passing Graph Neural Network on Heterophilic Graphs [20.470934944907608]
We investigate a novel framework that performs well on graphs with either homophily or heterophily. In label-wise message-passing, neighbors with similar pseudo labels will be aggregated together. We also propose a bi-level optimization method to automatically select the model for graphs with homophily/heterophily.
arXiv Detail & Related papers (2021-10-15T14:49:45Z)
Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC) In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning. For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z)
GraphHop: An Enhanced Label Propagation Method for Node Classification [34.073791157290614]
A scalable semi-supervised node classification method, called GraphHop, is proposed in this work. Experimental results show that GraphHop outperforms state-of-the-art graph learning methods on a wide range of tasks.
arXiv Detail & Related papers (2021-01-07T02:10:20Z)
Exploiting Heterogeneous Graph Neural Networks with Latent Worker/Task Correlation Information for Label Aggregation in Crowdsourcing [72.34616482076572]
Crowdsourcing has attracted much attention for its convenience to collect labels from non-expert workers instead of experts. We propose a novel framework based on graph neural networks for aggregating crowd labels.
arXiv Detail & Related papers (2020-10-25T10:12:37Z)
Learn to Propagate Reliably on Noisy Affinity Graphs [69.97364913330989]
Recent works have shown that exploiting unlabeled data through label propagation can substantially reduce the labeling cost. How to propagate labels reliably, especially on a dataset with unknown outliers, remains an open question. We propose a new framework that allows labels to be propagated reliably on large-scale real-world data.
arXiv Detail & Related papers (2020-07-17T07:55:59Z)
Inverse Graph Identification: Can We Identify Node Labels Given Graph Labels? [89.13567439679709]
Graph Identification (GI) has long been researched in graph learning and is essential in certain applications. This paper defines a novel problem dubbed Inverse Graph Identification (IGI) We propose a simple yet effective method that makes the node-level message passing process using Graph Attention Network (GAT) under the protocol of GI.
arXiv Detail & Related papers (2020-07-12T12:06:17Z)
Graph Pooling with Node Proximity for Hierarchical Representation Learning [80.62181998314547]
We propose a novel graph pooling strategy that leverages node proximity to improve the hierarchical representation learning of graph data with their multi-hop topology. Results show that the proposed graph pooling strategy is able to achieve state-of-the-art performance on a collection of public graph classification benchmark datasets.
arXiv Detail & Related papers (2020-06-19T13:09:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.