Adaptive Local Clustering over Attributed Graphs
- URL: http://arxiv.org/abs/2503.20488v1
- Date: Wed, 26 Mar 2025 12:24:07 GMT
- Title: Adaptive Local Clustering over Attributed Graphs
- Authors: Haoran Zheng, Renchi Yang, Jianliang Xu,
- Abstract summary: Local graph clustering aims to identify a subgraph $C_s in G$ surrounding $v_s$ in time roughly linear with the size of $C_s$.<n>Most existing solutions merely rely on the topological connectivity between nodes in $G$, rendering them vulnerable to missing or noisy links.<n>We propose LACA, an efficient and effective approach for LGC that achieves superb empirical performance on multiple real datasets.
- Score: 16.197833892857282
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a graph $G$ and a seed node $v_s$, the objective of local graph clustering (LGC) is to identify a subgraph $C_s \in G$ (a.k.a. local cluster) surrounding $v_s$ in time roughly linear with the size of $C_s$. This approach yields personalized clusters without needing to access the entire graph, which makes it highly suitable for numerous applications involving large graphs. However, most existing solutions merely rely on the topological connectivity between nodes in $G$, rendering them vulnerable to missing or noisy links that are commonly present in real-world graphs. To address this issue, this paper resorts to leveraging the complementary nature of graph topology and node attributes to enhance local clustering quality. To effectively exploit the attribute information, we first formulate the LGC as an estimation of the bidirectional diffusion distribution (BDD), which is specialized for capturing the multi-hop affinity between nodes in the presence of attributes. Furthermore, we propose LACA, an efficient and effective approach for LGC that achieves superb empirical performance on multiple real datasets while maintaining strong locality. The core components of LACA include (i) a fast and theoretically-grounded preprocessing technique for node attributes, (ii) an adaptive algorithm for diffusing any vectors over $G$ with rigorous theoretical guarantees and expedited convergence, and (iii) an effective three-step scheme for BDD approximation. Extensive experiments, comparing 17 competitors on 8 real datasets, show that LACA outperforms all competitors in terms of result quality measured against ground truth local clusters, while also being up to orders of magnitude faster. The code is available at https://github.com/HaoranZ99/alac.
Related papers
- Deep Cut-informed Graph Embedding and Clustering [35.07343461964133]
We propose an innovative and non-GNN-based Deep Cut-informed Graph embedding and Clustering framework, namely DCGC.<n>For the encoding module, we derive a cut-informed graph embedding objective to fuse graph structure and attributes by minimizing their joint normalized cut.<n>For the clustering module, we utilize the optimal transport theory to obtain the clustering assignments.
arXiv Detail & Related papers (2025-03-09T14:24:09Z) - Provably Extending PageRank-based Local Clustering Algorithm to Weighted Directed Graphs with Self-Loops and to Hypergraphs [40.215737469808026]
This work focuses on graph local clustering, which has broad applications beyond graphs because of the internal connectivities within various modalities.<n>We extend the non-approximating Andersen-Chung-Lang ("ACL") algorithm beyond discrete graphs and generalize its quadratic optimality to a wider range of graphs.<n>We theoretically prove that, under two mild conditions, both algorithms can identify a quadratically optimal local cluster in terms of conductance with at least 1/2 probability.
arXiv Detail & Related papers (2024-12-04T03:56:14Z) - CiliaGraph: Enabling Expression-enhanced Hyper-Dimensional Computation in Ultra-Lightweight and One-Shot Graph Classification on Edge [1.8726646412385333]
CiliaGraph is an enhanced expressive yet ultra-lightweight HDC model for graph classification.
CiliaGraph reduces memory usage and accelerates training speed by an average of 292 times.
arXiv Detail & Related papers (2024-05-29T12:22:59Z) - Graph Sparsification via Mixture of Graphs [67.40204130771967]
We introduce Mixture-of-Graphs (MoG) to dynamically select tailored pruning solutions for each node.
MoG incorporates multiple sparsifier experts, each characterized by unique sparsity levels and pruning criteria, and selects the appropriate experts for each node.
Experiments on four large-scale OGB datasets and two superpixel datasets, equipped with five GNNs, demonstrate that MoG identifies subgraphs at higher sparsity levels.
arXiv Detail & Related papers (2024-05-23T07:40:21Z) - Cluster-based Graph Collaborative Filtering [55.929052969825825]
Graph Convolution Networks (GCNs) have succeeded in learning user and item representations for recommendation systems.
Most existing GCN-based methods overlook the multiple interests of users while performing high-order graph convolution.
We propose a novel GCN-based recommendation model, termed Cluster-based Graph Collaborative Filtering (ClusterGCF)
arXiv Detail & Related papers (2024-04-16T07:05:16Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Dink-Net: Neural Clustering on Large Graphs [59.10189693120368]
A deep graph clustering method (Dink-Net) is proposed with the idea of dilation and shrink.
By discriminating nodes, whether being corrupted by augmentations, representations are learned in a self-supervised manner.
The clustering distribution is optimized by minimizing the proposed cluster dilation loss and cluster shrink loss.
Compared to the runner-up, Dink-Net 9.62% achieves NMI improvement on the ogbn-papers100M dataset with 111 million nodes and 1.6 billion edges.
arXiv Detail & Related papers (2023-05-28T15:33:24Z) - GC-Flow: A Graph-Based Flow Network for Effective Clustering [10.354035049272095]
Graph convolutional networks (GCNs) are emphdiscriminative models that directly model the class posterior $p(y|mathbfx)$ for semi-supervised classification of graph data.
In this work, we design normalizing flows that replace GCN layers, leading to a emphgenerative model that models both the class conditional likelihood $p(mathbfx|y)$ and the class prior $p(y)$.
The resulting neural network, GC-Flow, retains the graph convolution operations while being equipped with
arXiv Detail & Related papers (2023-05-26T22:11:38Z) - AnchorGAE: General Data Clustering via $O(n)$ Bipartite Graph
Convolution [79.44066256794187]
We show how to convert a non-graph dataset into a graph by introducing the generative graph model, which is used to build graph convolution networks (GCNs)
A bipartite graph constructed by anchors is updated dynamically to exploit the high-level information behind data.
We theoretically prove that the simple update will lead to degeneration and a specific strategy is accordingly designed.
arXiv Detail & Related papers (2021-11-12T07:08:13Z) - Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC)
In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning.
For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z) - Minimizing Localized Ratio Cut Objectives in Hypergraphs [32.80813008862995]
We present a framework for local hypergraph clustering based on minimizing localized ratio cut objectives.
Our algorithm is strongly-local, meaning that its runtime depends only on the size of the input set, and does not need to explore the entire hypergraph to find good local clusters.
arXiv Detail & Related papers (2020-02-21T17:42:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.