Randomized Spectral Clustering in Large-Scale Stochastic Block Models
- URL: http://arxiv.org/abs/2002.00839v3
- Date: Thu, 6 Jan 2022 16:04:57 GMT
- Title: Randomized Spectral Clustering in Large-Scale Stochastic Block Models
- Authors: Hai Zhang and Xiao Guo and Xiangyu Chang
- Abstract summary: We study the spectral clustering using randomized sketching algorithms from a statistical perspective.
Under mild conditions, the randomized spectral clustering algorithms lead to the same theoretical bounds as those of the original spectral clustering algorithm.
A new R package called Rclust is developed and made available to the public.
- Score: 13.366036495177923
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spectral clustering has been one of the widely used methods for community
detection in networks. However, large-scale networks bring computational
challenges to the eigenvalue decomposition therein. In this paper, we study the
spectral clustering using randomized sketching algorithms from a statistical
perspective, where we typically assume the network data are generated from a
stochastic block model that is not necessarily of full rank. To do this, we
first use the recently developed sketching algorithms to obtain two randomized
spectral clustering algorithms, namely, the random projection-based and the
random sampling-based spectral clustering. Then we study the theoretical bounds
of the resulting algorithms in terms of the approximation error for the
population adjacency matrix, the misclassification error, and the estimation
error for the link probability matrix. It turns out that, under mild
conditions, the randomized spectral clustering algorithms lead to the same
theoretical bounds as those of the original spectral clustering algorithm. We
also extend the results to degree-corrected stochastic block models. Numerical
experiments support our theoretical findings and show the efficiency of
randomized methods. A new R package called Rclust is developed and made
available to the public.
Related papers
- Consistency of Lloyd's Algorithm Under Perturbations [11.56433365648479]
Lloyd's algorithm is one of the most widely used clustering algorithms.
We show that the mis-clustering rate of Lloyd's algorithm on samples from a sub-Gaussian mixture is exponentially bounded after $O(log(n))$ iterations.
arXiv Detail & Related papers (2023-09-01T16:45:52Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - A Distributed Block Chebyshev-Davidson Algorithm for Parallel Spectral
Clustering [7.632758664232665]
We develop a distributed Block Chebyshev-Davidson algorithm to solve large-scale leading eigenvalue problems for spectral analysis in spectral clustering.
The efficiency of the Chebyshev-Davidson algorithm relies on the prior knowledge of the eigenvalue spectrum, which could be expensive to estimate.
A distributed and parallel version has been developed with attractive scalability.
arXiv Detail & Related papers (2022-12-08T18:06:13Z) - Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks.
Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - Correlation Clustering Reconstruction in Semi-Adversarial Models [70.11015369368272]
Correlation Clustering is an important clustering problem with many applications.
We study the reconstruction version of this problem in which one is seeking to reconstruct a latent clustering corrupted by random noise and adversarial modifications.
arXiv Detail & Related papers (2021-08-10T14:46:17Z) - Spectral clustering under degree heterogeneity: a case for the random
walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree.
In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z) - Spectral clustering on spherical coordinates under the degree-corrected
stochastic blockmodel [5.156484100374058]
A novel spectral clustering algorithm is proposed for community detection under the degree-corrected blockmodel.
Results show improved performance over competing methods in representing computer networks.
arXiv Detail & Related papers (2020-11-09T16:55:38Z) - Spectral Clustering using Eigenspectrum Shape Based Nystrom Sampling [19.675277307158435]
This paper proposes a scalable Nystrom-based clustering algorithm with a new sampling procedure, Centroid Minimum Sum of Squared Similarities (CMS3), and a on when to use it.
Our datasets depends on the eigen spectrum shape of the dataset, and yields competitive low-rank approximations in test compared to the other state-of-the-art methods.
arXiv Detail & Related papers (2020-07-21T17:49:03Z) - Randomized spectral co-clustering for large-scale directed networks [15.486507430729052]
Co-clustering aims to cluster the senders and receivers of directed networks simultaneously.
We leverage sketching techniques and derive two randomized spectral co-clustering algorithms.
We establish their approximation error rates and misclustering error rates, indicating better bounds than the state-of-the-art results of co-clustering literature.
arXiv Detail & Related papers (2020-04-25T15:00:55Z) - Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems.
We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms.
Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.