Scalable and Effective Conductance-based Graph Clustering
- URL: http://arxiv.org/abs/2211.12511v1
- Date: Tue, 22 Nov 2022 12:45:27 GMT
- Title: Scalable and Effective Conductance-based Graph Clustering
- Authors: Longlong Lin, Rong-Hua Li, Tao Jia
- Abstract summary: We develop a graph clustering framework textitPCon.
We show that many existing solutions can be reduced to our framework.
Based on our framework, we propose two novel algorithms textitPCon_core and emphPCon_de with linear time and space complexity.
- Score: 9.938406925123722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conductance-based graph clustering has been recognized as a fundamental
operator in numerous graph analysis applications. Despite the significant
success of conductance-based graph clustering, existing algorithms are either
hard to obtain satisfactory clustering qualities, or have high time and space
complexity to achieve provable clustering qualities. To overcome these
limitations, we devise a powerful \textit{peeling}-based graph clustering
framework \textit{PCon}. We show that many existing solutions can be reduced to
our framework. Namely, they first define a score function for each vertex, then
iteratively remove the vertex with the smallest score. Finally, they output the
result with the smallest conductance during the peeling process. Based on our
framework, we propose two novel algorithms \textit{PCon\_core} and
\emph{PCon\_de} with linear time and space complexity, which can efficiently
and effectively identify clusters from massive graphs with more than a few
billion edges. Surprisingly, we prove that \emph{PCon\_de} can identify
clusters with near-constant approximation ratio, resulting in an important
theoretical improvement over the well-known quadratic Cheeger bound. Empirical
results on real-life and synthetic datasets show that our algorithms can
achieve 5$\sim$42 times speedup with a high clustering accuracy, while using
1.4$\sim$7.8 times less memory than the baseline algorithms.
Related papers
- Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better [18.121514220195607]
Cluster deletion is an NP-hard graph clustering objective with applications in computational and social network analysis.
We first provide a tighter analysis of two previous approximation algorithms, improving their approximation guarantees from 4 to 3.
We show that both algorithms can be derandomized in a surprisingly simple way, by greedily taking a maximum degree in an auxiliary graph and forming a cluster around it.
arXiv Detail & Related papers (2024-04-24T18:39:18Z) - A Differentially Private Clustering Algorithm for Well-Clustered Graphs [6.523602840064548]
We provide an efficient ($epsilon,$delta$)-DP algorithm tailored specifically for such graphs.
Our algorithm works for well-clustered graphs with $k$ nearly-balanced clusters.
arXiv Detail & Related papers (2024-03-21T11:57:16Z) - Latent Random Steps as Relaxations of Max-Cut, Min-Cut, and More [30.919536115917726]
We present a probabilistic model based on non-negative matrix factorization which unifies clustering and simplification.
By relaxing the hard clustering to a soft clustering, our algorithm relaxes potentially hard clustering problems to a tractable ones.
arXiv Detail & Related papers (2023-08-12T02:47:57Z) - Dink-Net: Neural Clustering on Large Graphs [59.10189693120368]
A deep graph clustering method (Dink-Net) is proposed with the idea of dilation and shrink.
By discriminating nodes, whether being corrupted by augmentations, representations are learned in a self-supervised manner.
The clustering distribution is optimized by minimizing the proposed cluster dilation loss and cluster shrink loss.
Compared to the runner-up, Dink-Net 9.62% achieves NMI improvement on the ogbn-papers100M dataset with 111 million nodes and 1.6 billion edges.
arXiv Detail & Related papers (2023-05-28T15:33:24Z) - Deep Temporal Graph Clustering [77.02070768950145]
We propose a general framework for deep Temporal Graph Clustering (GC)
GC introduces deep clustering techniques to suit the interaction sequence-based batch-processing pattern of temporal graphs.
Our framework can effectively improve the performance of existing temporal graph learning methods.
arXiv Detail & Related papers (2023-05-18T06:17:50Z) - Efficient Graph Field Integrators Meet Point Clouds [59.27295475120132]
We present two new classes of algorithms for efficient field integration on graphs encoding point clouds.
The first class, SeparatorFactorization(SF), leverages the bounded genus of point cloud mesh graphs, while the second class, RFDiffusion(RFD), uses popular epsilon-nearest-neighbor graph representations for point clouds.
arXiv Detail & Related papers (2023-02-02T08:33:36Z) - ClusterFuG: Clustering Fully connected Graphs by Multicut [20.254912065749956]
In dense multicut, the clustering objective is given in a factorized form as inner products of node feature vectors.
We show how to rewrite classical greedy algorithms for multicut in our dense setting and how to modify them for greater efficiency and solution quality.
arXiv Detail & Related papers (2023-01-28T11:10:50Z) - Graphon based Clustering and Testing of Networks: Algorithms and Theory [11.3700474413248]
Network-valued data are encountered in a wide range of applications and pose challenges in learning.
We present two clustering algorithms that achieve state-of-the-art results.
We further study the applicability of the proposed distance for graph two-sample testing problems.
arXiv Detail & Related papers (2021-10-06T13:14:44Z) - Effective and Efficient Graph Learning for Multi-view Clustering [173.8313827799077]
We propose an effective and efficient graph learning model for multi-view clustering.
Our method exploits the view-similar between graphs of different views by the minimization of tensor Schatten p-norm.
Our proposed algorithm is time-economical and obtains the stable results and scales well with the data size.
arXiv Detail & Related papers (2021-08-15T13:14:28Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Learning to Cluster Faces via Confidence and Connectivity Estimation [136.5291151775236]
We propose a fully learnable clustering framework without requiring a large number of overlapped subgraphs.
Our method significantly improves clustering accuracy and thus performance of the recognition models trained on top, yet it is an order of magnitude more efficient than existing supervised methods.
arXiv Detail & Related papers (2020-04-01T13:39:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.