Fast and Simple Spectral Clustering in Theory and Practice
- URL: http://arxiv.org/abs/2310.10939v1
- Date: Tue, 17 Oct 2023 02:31:57 GMT
- Title: Fast and Simple Spectral Clustering in Theory and Practice
- Authors: Peter Macgregor
- Abstract summary: Spectral clustering is a popular and effective algorithm designed to find $k$ clusters in a graph $G$.
We present a simple spectral clustering algorithm based on a vertices embedding with $O(log(k))$ computed by the power method.
We evaluate the new algorithm on several synthetic and real-world datasets, finding that it is significantly faster than alternative clustering algorithms, while producing results with approximately the same clustering accuracy.
- Score: 7.070726553564701
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spectral clustering is a popular and effective algorithm designed to find $k$
clusters in a graph $G$. In the classical spectral clustering algorithm, the
vertices of $G$ are embedded into $\mathbb{R}^k$ using $k$ eigenvectors of the
graph Laplacian matrix. However, computing this embedding is computationally
expensive and dominates the running time of the algorithm. In this paper, we
present a simple spectral clustering algorithm based on a vertex embedding with
$O(\log(k))$ vectors computed by the power method. The vertex embedding is
computed in nearly-linear time with respect to the size of the graph, and the
algorithm provably recovers the ground truth clusters under natural assumptions
on the input graph. We evaluate the new algorithm on several synthetic and
real-world datasets, finding that it is significantly faster than alternative
clustering algorithms, while producing results with approximately the same
clustering accuracy.
Related papers
- Dynamic Spectral Clustering with Provable Approximation Guarantee [7.6676757797831225]
The paper proves that, under some mild condition on the cluster-structure, the clusters of the final graph $G_T$ can be well approximated by a dynamic variant of the spectral clustering algorithm.
Experimental studies on both synthetic and real-world datasets further confirm the practicality of our designed algorithm.
arXiv Detail & Related papers (2024-06-05T11:16:55Z) - A Differentially Private Clustering Algorithm for Well-Clustered Graphs [6.523602840064548]
We provide an efficient ($epsilon,$delta$)-DP algorithm tailored specifically for such graphs.
Our algorithm works for well-clustered graphs with $k$ nearly-balanced clusters.
arXiv Detail & Related papers (2024-03-21T11:57:16Z) - A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - Do you know what q-means? [50.045011844765185]
Clustering is one of the most important tools for analysis of large datasets.
We present an improved version of the "$q$-means" algorithm for clustering.
We also present a "dequantized" algorithm for $varepsilon which runs in $Obig(frack2varepsilon2(sqrtkd + log(Nd))big.
arXiv Detail & Related papers (2023-08-18T17:52:12Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Near-Optimal Quantum Coreset Construction Algorithms for Clustering [15.513270929560088]
We give quantum algorithms that find coresets for $k$-clustering in $mathbbRd$ with $tildeO(sqrtnkd3/2)$ query complexity.
Our coreset reduces the input size from $n$ to $mathrmpoly(kepsilon-1d)$, so that existing $alpha$-approximation algorithms for clustering can run on top of it.
arXiv Detail & Related papers (2023-06-05T12:22:46Z) - Differentially-Private Hierarchical Clustering with Provable
Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering.
We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset.
We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z) - Private estimation algorithms for stochastic block models and mixture
models [63.07482515700984]
General tools for designing efficient private estimation algorithms.
First efficient $(epsilon, delta)$-differentially private algorithm for both weak recovery and exact recovery.
arXiv Detail & Related papers (2023-01-11T09:12:28Z) - Skew-Symmetric Adjacency Matrices for Clustering Directed Graphs [5.301300942803395]
Cut-based directed graph (digraph) clustering often focuses on finding dense within-cluster or sparse between-cluster connections.
For flow-based clusterings the edges between clusters tend to be oriented in one direction and have been found in migration data, food webs, and trade data.
arXiv Detail & Related papers (2022-03-02T20:07:04Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Hierarchical Agglomerative Graph Clustering in Nearly-Linear Time [1.5644420658691407]
We study the widely used hierarchical agglomerative clustering (HAC) algorithm on edge-weighted graphs.
We define an algorithmic framework for hierarchical agglomerative graph clustering.
We show that our approach can speed up clustering of point datasets by a factor of 20.7--76.5x.
arXiv Detail & Related papers (2021-06-10T09:29:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.