A Restarted Large-Scale Spectral Clustering with Self-Guiding and Block
Diagonal Representation
- URL: http://arxiv.org/abs/2306.15138v2
- Date: Thu, 29 Jun 2023 06:51:09 GMT
- Title: A Restarted Large-Scale Spectral Clustering with Self-Guiding and Block
Diagonal Representation
- Authors: Yongyan Guo and Gang Wu
- Abstract summary: We propose a restarted clustering framework with self-guiding and block diagonal representation.
An advantage of the strategy is that some useful clustering information obtained from previous cycles could be preserved.
Theoretical results are established to show the rationality of inexact computations in spectral clustering.
- Score: 1.115905690697198
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Spectral clustering is one of the most popular unsupervised machine learning
methods. Constructing similarity matrix is crucial to this type of method. In
most existing works, the similarity matrix is computed once for all or is
updated alternatively. However, the former is difficult to reflect
comprehensive relationships among data points, and the latter is time-consuming
and is even infeasible for large-scale problems. In this work, we propose a
restarted clustering framework with self-guiding and block diagonal
representation. An advantage of the strategy is that some useful clustering
information obtained from previous cycles could be preserved as much as
possible. To the best of our knowledge, this is the first work that applies
restarting strategy to spectral clustering. The key difference is that we
reclassify the samples in each cycle of our method, while they are classified
only once in existing methods. To further release the overhead, we introduce a
block diagonal representation with Nystr\"{o}m approximation for constructing
the similarity matrix. Theoretical results are established to show the
rationality of inexact computations in spectral clustering. Comprehensive
experiments are performed on some benchmark databases, which show the
superiority of our proposed algorithms over many state-of-the-art algorithms
for large-scale problems. Specifically, our framework has a potential boost for
clustering algorithms and works well even using an initial guess chosen
randomly.
Related papers
- An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - On the Power of SVD in the Stochastic Block Model [6.661713888517129]
spectral-based dimensionality reduction tools, such as PCA or SVD, improve the performance of clustering algorithms in many applications.
We show that, in the symmetric setting, vanilla-SVD algorithm recovers all clusters correctly.
arXiv Detail & Related papers (2023-09-27T00:04:27Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Scalable Clustering: Large Scale Unsupervised Learning of Gaussian
Mixture Models with Outliers [5.478764356647437]
This paper introduces a provably robust clustering algorithm based on loss minimization.
It provides theoretical guarantees that the algorithm obtains high accuracy with high probability.
Experiments on real-world large-scale datasets demonstrate the effectiveness of the algorithm.
arXiv Detail & Related papers (2023-02-28T14:39:18Z) - Genie: A new, fast, and outlier-resistant hierarchical clustering
algorithm [3.7491936479803054]
We propose a new hierarchical clustering linkage criterion called Genie.
Our algorithm links two clusters in such a way that a chosen economic inequity measure does not drastically increase above a given threshold.
A reference implementation of the algorithm has been included in the open source 'genie' package for R.
arXiv Detail & Related papers (2022-09-13T06:42:53Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - Correlation Clustering Reconstruction in Semi-Adversarial Models [70.11015369368272]
Correlation Clustering is an important clustering problem with many applications.
We study the reconstruction version of this problem in which one is seeking to reconstruct a latent clustering corrupted by random noise and adversarial modifications.
arXiv Detail & Related papers (2021-08-10T14:46:17Z) - Divide-and-conquer based Large-Scale Spectral Clustering [8.545202841051582]
We propose a divide-and-conquer based large-scale spectral clustering method to strike a good balance between efficiency and effectiveness.
The proposed method achieves lower computational complexity than most existing large-scale spectral clustering.
arXiv Detail & Related papers (2021-04-30T15:09:45Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Spectral Clustering with Smooth Tiny Clusters [14.483043753721256]
We propose a novel clustering algorithm, which con-siders the smoothness of data for the first time.
Our key idea is to cluster tiny clusters, whose centers constitute smooth graphs.
Although in this paper, we singly focus on multi-scale situations, the idea of data smoothness can certainly be extended to any clustering algorithms.
arXiv Detail & Related papers (2020-09-10T05:21:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.