A Distributed Block Chebyshev-Davidson Algorithm for Parallel Spectral
Clustering
- URL: http://arxiv.org/abs/2212.04443v2
- Date: Fri, 5 Jan 2024 16:40:44 GMT
- Title: A Distributed Block Chebyshev-Davidson Algorithm for Parallel Spectral
Clustering
- Authors: Qiyuan Pang and Haizhao Yang
- Abstract summary: We develop a distributed Block Chebyshev-Davidson algorithm to solve large-scale leading eigenvalue problems for spectral analysis in spectral clustering.
The efficiency of the Chebyshev-Davidson algorithm relies on the prior knowledge of the eigenvalue spectrum, which could be expensive to estimate.
A distributed and parallel version has been developed with attractive scalability.
- Score: 7.632758664232665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop a distributed Block Chebyshev-Davidson algorithm to solve
large-scale leading eigenvalue problems for spectral analysis in spectral
clustering. First, the efficiency of the Chebyshev-Davidson algorithm relies on
the prior knowledge of the eigenvalue spectrum, which could be expensive to
estimate. This issue can be lessened by the analytic spectrum estimation of the
Laplacian or normalized Laplacian matrices in spectral clustering, making the
proposed algorithm very efficient for spectral clustering. Second, to make the
proposed algorithm capable of analyzing big data, a distributed and parallel
version has been developed with attractive scalability. The speedup by parallel
computing is approximately equivalent to $\sqrt{p}$, where $p$ denotes the
number of processes. {Numerical results will be provided to demonstrate its
efficiency in spectral clustering and scalability advantage over existing
eigensolvers used for spectral clustering in parallel computing environments.}
Related papers
- An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - On the Power of SVD in the Stochastic Block Model [6.661713888517129]
spectral-based dimensionality reduction tools, such as PCA or SVD, improve the performance of clustering algorithms in many applications.
We show that, in the symmetric setting, vanilla-SVD algorithm recovers all clusters correctly.
arXiv Detail & Related papers (2023-09-27T00:04:27Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Approximating a RUM from Distributions on k-Slates [88.32814292632675]
We find a generalization-time algorithm that finds the RUM that best approximates the given distribution on average.
Our theoretical result can also be made practical: we obtain a that is effective and scales to real-world datasets.
arXiv Detail & Related papers (2023-05-22T17:43:34Z) - Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks.
Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z) - Improving Spectral Clustering Using Spectrum-Preserving Node Reduction [1.52292571922932]
We use spectrum-preserving node reduction to accelerate eigen-decomposition and generate concise representations of data sets.
The experimental results show dramatically improved clustering performance when compared with state-of-the-art methods.
arXiv Detail & Related papers (2021-10-24T01:43:12Z) - Divide-and-conquer based Large-Scale Spectral Clustering [8.545202841051582]
We propose a divide-and-conquer based large-scale spectral clustering method to strike a good balance between efficiency and effectiveness.
The proposed method achieves lower computational complexity than most existing large-scale spectral clustering.
arXiv Detail & Related papers (2021-04-30T15:09:45Z) - Multi-View Spectral Clustering with High-Order Optimal Neighborhood
Laplacian Matrix [57.11971786407279]
Multi-view spectral clustering can effectively reveal the intrinsic cluster structure among data.
This paper proposes a multi-view spectral clustering algorithm that learns a high-order optimal neighborhood Laplacian matrix.
Our proposed algorithm generates the optimal Laplacian matrix by searching the neighborhood of the linear combination of both the first-order and high-order base.
arXiv Detail & Related papers (2020-08-31T12:28:40Z) - Spectral Clustering using Eigenspectrum Shape Based Nystrom Sampling [19.675277307158435]
This paper proposes a scalable Nystrom-based clustering algorithm with a new sampling procedure, Centroid Minimum Sum of Squared Similarities (CMS3), and a on when to use it.
Our datasets depends on the eigen spectrum shape of the dataset, and yields competitive low-rank approximations in test compared to the other state-of-the-art methods.
arXiv Detail & Related papers (2020-07-21T17:49:03Z) - Local Graph Clustering with Network Lasso [90.66817876491052]
We study the statistical and computational properties of a network Lasso method for local graph clustering.
The clusters delivered by nLasso can be characterized elegantly via network flows between cluster boundary and seed nodes.
arXiv Detail & Related papers (2020-04-25T17:52:05Z) - Randomized Spectral Clustering in Large-Scale Stochastic Block Models [13.366036495177923]
We study the spectral clustering using randomized sketching algorithms from a statistical perspective.
Under mild conditions, the randomized spectral clustering algorithms lead to the same theoretical bounds as those of the original spectral clustering algorithm.
A new R package called Rclust is developed and made available to the public.
arXiv Detail & Related papers (2020-01-20T04:15:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.