Related papers: Scalable and Effective Conductance-based Graph Clustering

Scalable and Effective Conductance-based Graph Clustering

URL: http://arxiv.org/abs/2211.12511v1
Date: Tue, 22 Nov 2022 12:45:27 GMT
Title: Scalable and Effective Conductance-based Graph Clustering
Authors: Longlong Lin, Rong-Hua Li, Tao Jia
Abstract summary: We develop a graph clustering framework textitPCon. We show that many existing solutions can be reduced to our framework. Based on our framework, we propose two novel algorithms textitPCon_core and emphPCon_de with linear time and space complexity.
Score: 9.938406925123722
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Conductance-based graph clustering has been recognized as a fundamental operator in numerous graph analysis applications. Despite the significant success of conductance-based graph clustering, existing algorithms are either hard to obtain satisfactory clustering qualities, or have high time and space complexity to achieve provable clustering qualities. To overcome these limitations, we devise a powerful \textit{peeling}-based graph clustering framework \textit{PCon}. We show that many existing solutions can be reduced to our framework. Namely, they first define a score function for each vertex, then iteratively remove the vertex with the smallest score. Finally, they output the result with the smallest conductance during the peeling process. Based on our framework, we propose two novel algorithms \textit{PCon\_core} and \emph{PCon\_de} with linear time and space complexity, which can efficiently and effectively identify clusters from massive graphs with more than a few billion edges. Surprisingly, we prove that \emph{PCon\_de} can identify clusters with near-constant approximation ratio, resulting in an important theoretical improvement over the well-known quadratic Cheeger bound. Empirical results on real-life and synthetic datasets show that our algorithms can achieve 5$\sim$42 times speedup with a high clustering accuracy, while using 1.4$\sim$7.8 times less memory than the baseline algorithms.

Related papers

Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better [18.121514220195607]
Cluster deletion is an NP-hard graph clustering objective with applications in computational and social network analysis. We first provide a tighter analysis of two previous approximation algorithms, improving their approximation guarantees from 4 to 3. We show that both algorithms can be derandomized in a surprisingly simple way, by greedily taking a maximum degree in an auxiliary graph and forming a cluster around it.
arXiv Detail & Related papers (2024-04-24T18:39:18Z)
A Differentially Private Clustering Algorithm for Well-Clustered Graphs [6.523602840064548]
We provide an efficient ($epsilon,$delta$)-DP algorithm tailored specifically for such graphs. Our algorithm works for well-clustered graphs with $k$ nearly-balanced clusters.
arXiv Detail & Related papers (2024-03-21T11:57:16Z)
Latent Random Steps as Relaxations of Max-Cut, Min-Cut, and More [30.919536115917726]
We present a probabilistic model based on non-negative matrix factorization which unifies clustering and simplification. By relaxing the hard clustering to a soft clustering, our algorithm relaxes potentially hard clustering problems to a tractable ones.
arXiv Detail & Related papers (2023-08-12T02:47:57Z)
Dink-Net: Neural Clustering on Large Graphs [59.10189693120368]
A deep graph clustering method (Dink-Net) is proposed with the idea of dilation and shrink. By discriminating nodes, whether being corrupted by augmentations, representations are learned in a self-supervised manner. The clustering distribution is optimized by minimizing the proposed cluster dilation loss and cluster shrink loss. Compared to the runner-up, Dink-Net 9.62% achieves NMI improvement on the ogbn-papers100M dataset with 111 million nodes and 1.6 billion edges.
arXiv Detail & Related papers (2023-05-28T15:33:24Z)
Deep Temporal Graph Clustering [77.02070768950145]
We propose a general framework for deep Temporal Graph Clustering (GC) GC introduces deep clustering techniques to suit the interaction sequence-based batch-processing pattern of temporal graphs. Our framework can effectively improve the performance of existing temporal graph learning methods.
arXiv Detail & Related papers (2023-05-18T06:17:50Z)
Efficient Graph Field Integrators Meet Point Clouds [59.27295475120132]
We present two new classes of algorithms for efficient field integration on graphs encoding point clouds. The first class, SeparatorFactorization(SF), leverages the bounded genus of point cloud mesh graphs, while the second class, RFDiffusion(RFD), uses popular epsilon-nearest-neighbor graph representations for point clouds.
arXiv Detail & Related papers (2023-02-02T08:33:36Z)
ClusterFuG: Clustering Fully connected Graphs by Multicut [20.254912065749956]
In dense multicut, the clustering objective is given in a factorized form as inner products of node feature vectors. We show how to rewrite classical greedy algorithms for multicut in our dense setting and how to modify them for greater efficiency and solution quality.
arXiv Detail & Related papers (2023-01-28T11:10:50Z)
Graphon based Clustering and Testing of Networks: Algorithms and Theory [11.3700474413248]
Network-valued data are encountered in a wide range of applications and pose challenges in learning. We present two clustering algorithms that achieve state-of-the-art results. We further study the applicability of the proposed distance for graph two-sample testing problems.
arXiv Detail & Related papers (2021-10-06T13:14:44Z)
Effective and Efficient Graph Learning for Multi-view Clustering [173.8313827799077]
We propose an effective and efficient graph learning model for multi-view clustering. Our method exploits the view-similar between graphs of different views by the minimization of tensor Schatten p-norm. Our proposed algorithm is time-economical and obtains the stable results and scales well with the data size.
arXiv Detail & Related papers (2021-08-15T13:14:28Z)
Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA. We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
Learning to Cluster Faces via Confidence and Connectivity Estimation [136.5291151775236]
We propose a fully learnable clustering framework without requiring a large number of overlapped subgraphs. Our method significantly improves clustering accuracy and thus performance of the recognition models trained on top, yet it is an order of magnitude more efficient than existing supervised methods.
arXiv Detail & Related papers (2020-04-01T13:39:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.