A Tighter Analysis of Spectral Clustering, and Beyond
- URL: http://arxiv.org/abs/2208.01724v1
- Date: Tue, 2 Aug 2022 20:18:07 GMT
- Title: A Tighter Analysis of Spectral Clustering, and Beyond
- Authors: Peter Macgregor and He Sun
- Abstract summary: This work studies the classical spectral clustering algorithm which embeds the vertices of some graph $G=(V_G, E_G)$ into $mathbbRk$ using $k$ eigenvectors of some matrix of $G$.
Our first result is a tighter analysis on the performance of spectral clustering, and explains why it works under some much weaker condition than the ones studied in the literature.
For the second result, we show that, by applying fewer than $k$ eigenvectors to construct the embedding, spectral clustering is able to produce better output for many
- Score: 9.759415650727892
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work studies the classical spectral clustering algorithm which embeds
the vertices of some graph $G=(V_G, E_G)$ into $\mathbb{R}^k$ using $k$
eigenvectors of some matrix of $G$, and applies $k$-means to partition $V_G$
into $k$ clusters. Our first result is a tighter analysis on the performance of
spectral clustering, and explains why it works under some much weaker condition
than the ones studied in the literature. For the second result, we show that,
by applying fewer than $k$ eigenvectors to construct the embedding, spectral
clustering is able to produce better output for many practical instances; this
result is the first of its kind in spectral clustering. Besides its conceptual
and theoretical significance, the practical impact of our work is demonstrated
by the empirical analysis on both synthetic and real-world datasets, in which
spectral clustering produces comparable or better results with fewer than $k$
eigenvectors.
Related papers
- Datacube segmentation via Deep Spectral Clustering [76.48544221010424]
Extended Vision techniques often pose a challenge in their interpretation.
The huge dimensionality of data cube spectra poses a complex task in its statistical interpretation.
In this paper, we explore the possibility of applying unsupervised clustering methods in encoded space.
A statistical dimensional reduction is performed by an ad hoc trained (Variational) AutoEncoder, while the clustering process is performed by a (learnable) iterative K-Means clustering algorithm.
arXiv Detail & Related papers (2024-01-31T09:31:28Z) - Fast and Simple Spectral Clustering in Theory and Practice [7.070726553564701]
Spectral clustering is a popular and effective algorithm designed to find $k$ clusters in a graph $G$.
We present a simple spectral clustering algorithm based on a vertices embedding with $O(log(k))$ computed by the power method.
We evaluate the new algorithm on several synthetic and real-world datasets, finding that it is significantly faster than alternative clustering algorithms, while producing results with approximately the same clustering accuracy.
arXiv Detail & Related papers (2023-10-17T02:31:57Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - On consistency of constrained spectral clustering under
representation-aware stochastic block model [20.6072287343024]
We study constrained spectral clustering with the aim of finding balanced clusters in a given textitsimilarity graph $mathcalG$.
We develop unnormalized and normalized variants of spectral clustering in this setting.
These algorithms use $mathcalR$ to find clusters in $mathcalG$ that approximately satisfy the proposed constraint.
arXiv Detail & Related papers (2022-03-03T20:41:14Z) - Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points.
We provide implementable differentially private clustering algorithms that provide utility when the data is "easy"
We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z) - Hierarchical Clustering: $O(1)$-Approximation for Well-Clustered Graphs [3.2901541059183432]
We present two-time approximation algorithms for hierarchical clustering.
The significance of our work is demonstrated by the empirical analysis on both synthetic and real-world data sets.
arXiv Detail & Related papers (2021-12-16T17:52:04Z) - Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples.
We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z) - An $\ell_p$ theory of PCA and spectral clustering [23.90245234027558]
Principal Component Analysis is a powerful tool in statistics and machine learning.
In this paper, we develop an $ell_p$ theory for a hollowed version of PCA in Hilbert spaces.
For contextual community detection, the $ell_p$ theory leads to a simple spectral algorithm that achieves the information threshold for exact recovery.
arXiv Detail & Related papers (2020-06-24T21:30:28Z) - Average Sensitivity of Spectral Clustering [31.283432482502278]
We study the stability of spectral clustering against edge perturbations in the input graph.
Our results suggest that spectral clustering is stable against edge perturbations when there is a cluster structure in the input graph.
arXiv Detail & Related papers (2020-06-07T09:14:44Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Embedding Graph Auto-Encoder for Graph Clustering [90.8576971748142]
Graph auto-encoder (GAE) models are based on semi-supervised graph convolution networks (GCN)
We design a specific GAE-based model for graph clustering to be consistent with the theory, namely Embedding Graph Auto-Encoder (EGAE)
EGAE consists of one encoder and dual decoders.
arXiv Detail & Related papers (2020-02-20T09:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.