Randomized Spectral Clustering in Large-Scale Stochastic Block Models
- URL: http://arxiv.org/abs/2002.00839v3
- Date: Thu, 6 Jan 2022 16:04:57 GMT
- Title: Randomized Spectral Clustering in Large-Scale Stochastic Block Models
- Authors: Hai Zhang and Xiao Guo and Xiangyu Chang
- Abstract summary: We study the spectral clustering using randomized sketching algorithms from a statistical perspective.
Under mild conditions, the randomized spectral clustering algorithms lead to the same theoretical bounds as those of the original spectral clustering algorithm.
A new R package called Rclust is developed and made available to the public.
- Score: 13.366036495177923
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spectral clustering has been one of the widely used methods for community
detection in networks. However, large-scale networks bring computational
challenges to the eigenvalue decomposition therein. In this paper, we study the
spectral clustering using randomized sketching algorithms from a statistical
perspective, where we typically assume the network data are generated from a
stochastic block model that is not necessarily of full rank. To do this, we
first use the recently developed sketching algorithms to obtain two randomized
spectral clustering algorithms, namely, the random projection-based and the
random sampling-based spectral clustering. Then we study the theoretical bounds
of the resulting algorithms in terms of the approximation error for the
population adjacency matrix, the misclassification error, and the estimation
error for the link probability matrix. It turns out that, under mild
conditions, the randomized spectral clustering algorithms lead to the same
theoretical bounds as those of the original spectral clustering algorithm. We
also extend the results to degree-corrected stochastic block models. Numerical
experiments support our theoretical findings and show the efficiency of
randomized methods. A new R package called Rclust is developed and made
available to the public.
Related papers
- A spectral clustering-type algorithm for the consistent estimation of the Hurst distribution in moderately high dimensions [8.829673021172587]
We construct an algorithm for the statistical identification of the Hurst distribution undergirding a high-dimensional fractal system.
In a moderately high-dimensional regime where the dimension, the sample size and the scale go to infinity, we show that the algorithm consistently estimates the Hurst distribution.
We apply the algorithm in the analysis of real-world macroeconomic time series to unveil evidence for cointegration.
arXiv Detail & Related papers (2025-01-30T03:34:08Z) - On the Robustness of Spectral Algorithms for Semirandom Stochastic Block Models [11.137244714693779]
We study the robustness of spectral algorithms against semirandom adversaries.
We identify classes of semirandom adversaries under which spectral bisection using the _unnormalized_ Laplacian is strongly consistent.
arXiv Detail & Related papers (2024-12-18T20:35:02Z) - Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [69.15976031704687]
We propose IAC (Instance-Adaptive Clustering), the first algorithm whose performance matches the instance-specific lower bounds both in expectation and with high probability.
IAC maintains an overall computational complexity of $ mathcalO(n, textpolylog(n) $, making it scalable and practical for large-scale problems.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - A Distributed Block Chebyshev-Davidson Algorithm for Parallel Spectral
Clustering [7.632758664232665]
We develop a distributed Block Chebyshev-Davidson algorithm to solve large-scale leading eigenvalue problems for spectral analysis in spectral clustering.
The efficiency of the Chebyshev-Davidson algorithm relies on the prior knowledge of the eigenvalue spectrum, which could be expensive to estimate.
A distributed and parallel version has been developed with attractive scalability.
arXiv Detail & Related papers (2022-12-08T18:06:13Z) - Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks.
Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - Correlation Clustering Reconstruction in Semi-Adversarial Models [70.11015369368272]
Correlation Clustering is an important clustering problem with many applications.
We study the reconstruction version of this problem in which one is seeking to reconstruct a latent clustering corrupted by random noise and adversarial modifications.
arXiv Detail & Related papers (2021-08-10T14:46:17Z) - Spectral clustering under degree heterogeneity: a case for the random
walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree.
In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z) - Spectral clustering on spherical coordinates under the degree-corrected
stochastic blockmodel [5.156484100374058]
A novel spectral clustering algorithm is proposed for community detection under the degree-corrected blockmodel.
Results show improved performance over competing methods in representing computer networks.
arXiv Detail & Related papers (2020-11-09T16:55:38Z) - Randomized spectral co-clustering for large-scale directed networks [15.486507430729052]
Co-clustering aims to cluster the senders and receivers of directed networks simultaneously.
We leverage sketching techniques and derive two randomized spectral co-clustering algorithms.
We establish their approximation error rates and misclustering error rates, indicating better bounds than the state-of-the-art results of co-clustering literature.
arXiv Detail & Related papers (2020-04-25T15:00:55Z) - Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems.
We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms.
Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.