Local Graph Clustering with Noisy Labels
- URL: http://arxiv.org/abs/2310.08031v2
- Date: Mon, 4 Mar 2024 02:57:11 GMT
- Title: Local Graph Clustering with Noisy Labels
- Authors: Artur Back de Luca, Kimon Fountoulakis, Shenghao Yang
- Abstract summary: We propose a study of local graph clustering using noisy node labels as a proxy for additional node information.
In this setting, nodes receive initial binary labels based on cluster affiliation: 1 if they belong to the target cluster and 0 otherwise.
We show that reliable node labels can be obtained with just a few samples from an attributed graph.
- Score: 8.142265733890918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The growing interest in machine learning problems over graphs with additional
node information such as texts, images, or labels has popularized methods that
require the costly operation of processing the entire graph. Yet, little effort
has been made to the development of fast local methods (i.e. without accessing
the entire graph) that extract useful information from such data. To that end,
we propose a study of local graph clustering using noisy node labels as a proxy
for additional node information. In this setting, nodes receive initial binary
labels based on cluster affiliation: 1 if they belong to the target cluster and
0 otherwise. Subsequently, a fraction of these labels is flipped. We
investigate the benefits of incorporating noisy labels for local graph
clustering. By constructing a weighted graph with such labels, we study the
performance of graph diffusion-based local clustering method on both the
original and the weighted graphs. From a theoretical perspective, we consider
recovering an unknown target cluster with a single seed node in a random graph
with independent noisy node labels. We provide sufficient conditions on the
label noise under which, with high probability, using diffusion in the weighted
graph yields a more accurate recovery of the target cluster. This approach
proves more effective than using the given labels alone or using diffusion in
the label-free original graph. Empirically, we show that reliable node labels
can be obtained with just a few samples from an attributed graph. Moreover,
utilizing these labels via diffusion in the weighted graph leads to
significantly better local clustering performance across several real-world
datasets, improving F1 scores by up to 13%.
Related papers
- Resurrecting Label Propagation for Graphs with Heterophily and Label Noise [40.11022005996222]
Label noise is a common challenge in large datasets, as it can significantly degrade the generalization ability of deep neural networks.
We study graph label noise in the context of arbitrary heterophily, with the aim of rectifying noisy labels and assigning labels to previously unlabeled nodes.
$R2LP$ is an iterative algorithm with three steps: (1) reconstruct the graph to recover the homophily property, (2) utilize label propagation to rectify the noisy labels, and (3) select high-confidence labels to retain for the next iteration.
arXiv Detail & Related papers (2023-10-25T11:28:26Z) - Learning on Graphs under Label Noise [5.909452203428086]
We develop a novel approach dubbed Consistent Graph Neural Network (CGNN) to solve the problem of learning on graphs with label noise.
Specifically, we employ graph contrastive learning as a regularization term, which promotes two views of augmented nodes to have consistent representations.
To detect noisy labels on the graph, we present a sample selection technique based on the homophily assumption.
arXiv Detail & Related papers (2023-06-14T01:38:01Z) - LiftPool: Lifting-based Graph Pooling for Hierarchical Graph
Representation Learning [53.176603566951016]
We propose an enhanced three-stage method via lifting, named LiftPool, to improve hierarchical graph representation.
For each node to be removed, its local information is obtained by subtracting the global information aggregated from its neighboring preserved nodes.
Evaluations on benchmark graph datasets show that LiftPool substantially outperforms the state-of-the-art graph pooling methods in the task of graph classification.
arXiv Detail & Related papers (2022-04-27T12:38:02Z) - Graph Transplant: Node Saliency-Guided Graph Mixup with Local Structure
Preservation [27.215800308343322]
We present the first Mixup-like graph augmentation method at the graph-level called Graph Transplant.
Our method identifies the sub-structure as a mix unit that can preserve the local information.
We extensively validate our method with diverse GNN architectures on multiple graph classification benchmark datasets.
arXiv Detail & Related papers (2021-11-10T11:10:13Z) - Label-Wise Message Passing Graph Neural Network on Heterophilic Graphs [20.470934944907608]
We investigate a novel framework that performs well on graphs with either homophily or heterophily.
In label-wise message-passing, neighbors with similar pseudo labels will be aggregated together.
We also propose a bi-level optimization method to automatically select the model for graphs with homophily/heterophily.
arXiv Detail & Related papers (2021-10-15T14:49:45Z) - Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC)
In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning.
For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z) - GraphHop: An Enhanced Label Propagation Method for Node Classification [34.073791157290614]
A scalable semi-supervised node classification method, called GraphHop, is proposed in this work.
Experimental results show that GraphHop outperforms state-of-the-art graph learning methods on a wide range of tasks.
arXiv Detail & Related papers (2021-01-07T02:10:20Z) - Exploiting Heterogeneous Graph Neural Networks with Latent Worker/Task
Correlation Information for Label Aggregation in Crowdsourcing [72.34616482076572]
Crowdsourcing has attracted much attention for its convenience to collect labels from non-expert workers instead of experts.
We propose a novel framework based on graph neural networks for aggregating crowd labels.
arXiv Detail & Related papers (2020-10-25T10:12:37Z) - Learn to Propagate Reliably on Noisy Affinity Graphs [69.97364913330989]
Recent works have shown that exploiting unlabeled data through label propagation can substantially reduce the labeling cost.
How to propagate labels reliably, especially on a dataset with unknown outliers, remains an open question.
We propose a new framework that allows labels to be propagated reliably on large-scale real-world data.
arXiv Detail & Related papers (2020-07-17T07:55:59Z) - Inverse Graph Identification: Can We Identify Node Labels Given Graph
Labels? [89.13567439679709]
Graph Identification (GI) has long been researched in graph learning and is essential in certain applications.
This paper defines a novel problem dubbed Inverse Graph Identification (IGI)
We propose a simple yet effective method that makes the node-level message passing process using Graph Attention Network (GAT) under the protocol of GI.
arXiv Detail & Related papers (2020-07-12T12:06:17Z) - Graph Pooling with Node Proximity for Hierarchical Representation
Learning [80.62181998314547]
We propose a novel graph pooling strategy that leverages node proximity to improve the hierarchical representation learning of graph data with their multi-hop topology.
Results show that the proposed graph pooling strategy is able to achieve state-of-the-art performance on a collection of public graph classification benchmark datasets.
arXiv Detail & Related papers (2020-06-19T13:09:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.