A Faster $k$-means++ Algorithm
- URL: http://arxiv.org/abs/2211.15118v2
- Date: Tue, 13 Feb 2024 19:39:48 GMT
- Title: A Faster $k$-means++ Algorithm
- Authors: Jiehao Liang, Somdeb Sarkhel, Zhao Song, Chenbo Yin, Junze Yin,
Danyang Zhuo
- Abstract summary: We present a new algorithm that can solve the $k$-means++ problem with nearly optimal running time.
We propose a new algorithm textscFastKmeans++ that only takes in $widetildeO(nd + nk2)$ time, in total.
- Score: 11.428775569173638
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: $k$-means++ is an important algorithm for choosing initial cluster centers
for the $k$-means clustering algorithm. In this work, we present a new
algorithm that can solve the $k$-means++ problem with nearly optimal running
time. Given $n$ data points in $\mathbb{R}^d$, the current state-of-the-art
algorithm runs in $\widetilde{O}(k )$ iterations, and each iteration takes
$\widetilde{O}(nd k)$ time. The overall running time is thus $\widetilde{O}(n d
k^2)$. We propose a new algorithm \textsc{FastKmeans++} that only takes in
$\widetilde{O}(nd + nk^2)$ time, in total.
Related papers
- Mini-Batch Kernel $k$-means [4.604003661048267]
A single iteration of our algorithm takes $widetildeO(kb2)$ time, significantly faster than the $O(n2)$ time required by the full batch kernel $k$-means.
Experiments demonstrate that our algorithm consistently achieves a 10-100x speedup with minimal loss in quality.
arXiv Detail & Related papers (2024-10-08T10:59:14Z) - Almost-linear Time Approximation Algorithm to Euclidean $k$-median and $k$-means [4.271492285528115]
We focus on the Euclidean $k$-median and $k$-means problems, two of the standard ways to model the task of clustering.
In this paper, we almost answer this question by presenting an almost linear-time algorithm to compute a constant-factor approximation.
arXiv Detail & Related papers (2024-07-15T20:04:06Z) - A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile
Streaming [48.18845814885398]
We develop new techniques to extend the applicability of sketching-based approaches to sparse dictionary learning and the Euclidean $k$-means clustering problems.
On the fast algorithms front, we obtain a new approach for designing PTAS's for the $k$-means clustering problem.
On the streaming algorithms front, we obtain new upper bounds and lower bounds for dictionary learning and $k$-means clustering.
arXiv Detail & Related papers (2023-10-29T16:46:26Z) - Do you know what q-means? [50.045011844765185]
Clustering is one of the most important tools for analysis of large datasets.
We present an improved version of the "$q$-means" algorithm for clustering.
We also present a "dequantized" algorithm for $varepsilon which runs in $Obig(frack2varepsilon2(sqrtkd + log(Nd))big.
arXiv Detail & Related papers (2023-08-18T17:52:12Z) - A Nearly Tight Analysis of Greedy k-means++ [1.452875650827562]
The $k$-means++ algorithm is known to return a $Theta(log k)$ approximate solution in expectation.
We present nearly matching lower and upper bounds for the greedy $k$-means++ algorithm.
arXiv Detail & Related papers (2022-07-16T13:49:07Z) - Nearly Optimal Policy Optimization with Stable at Any Time Guarantee [53.155554415415445]
Policy-based method in citetshani 2020optimistic is only $tildeO(sqrtSAH3K + sqrtAH4K)$ where $S$ is the number of states, $A$ is the number of actions, $H$ is the horizon, and $K$ is the number of episodes, and there is a $sqrtSH$ gap compared with the information theoretic lower bound $tildeOmega(sqrtSAH
arXiv Detail & Related papers (2021-12-21T01:54:17Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
We consider the problem of learning a latent $k$-vertex simplex $KsubsetmathbbRdtimes n$, given access to $AinmathbbRdtimes n$.
We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$.
arXiv Detail & Related papers (2021-05-17T16:40:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.