Related papers: k-means++: few more steps yield constant approximation

k-means++: few more steps yield constant approximation

URL: http://arxiv.org/abs/2002.07784v1
Date: Tue, 18 Feb 2020 18:28:25 GMT
Title: k-means++: few more steps yield constant approximation
Authors: Davin Choo, Christoph Grunau, Julian Portmann, V\'aclav Rozho\v{n}
Abstract summary: The k-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is a state-of-the-art algorithm for solving the k-means clustering problem. Recently, Lattanzi and Sohler (ICML) proposed augmenting k-means++ with O(k log k) local search steps to yield a constant approximation (in expectation) to the k-means clustering problem.
Score: 3.7468898363447654
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The k-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is a state-of-the-art algorithm for solving the k-means clustering problem and is known to give an O(log k)-approximation in expectation. Recently, Lattanzi and Sohler (ICML 2019) proposed augmenting k-means++ with O(k log log k) local search steps to yield a constant approximation (in expectation) to the k-means clustering problem. In this paper, we improve their analysis to show that, for any arbitrarily small constant $\eps > 0$, with only $\eps k$ additional local search steps, one can achieve a constant approximation guarantee (with high probability in k), resolving an open problem in their paper.

Related papers

Near-Optimal Clustering in Mixture of Markov Chains [74.3828414695655]
We study the problem of clustering $T$ trajectories of length $H$, each generated by one of $K$ unknown ergodic Markov chains over a finite state space of size $S$.<n>We derive an instance-dependent, high-probability lower bound on the clustering error rate, governed by the weighted KL divergence between the transition kernels of the chains.<n>We then present a novel two-stage clustering algorithm.
arXiv Detail & Related papers (2025-06-02T05:10:40Z)
K*-Means: A Parameter-free Clustering Algorithm [55.20132267309382]
k*-means is a novel clustering algorithm that eliminates the need to set k or any other parameters.<n>It uses the minimum description length principle to automatically determine the optimal number of clusters, k*, by splitting and merging clusters.<n>We prove that k*-means is guaranteed to converge and demonstrate experimentally that it significantly outperforms existing methods in scenarios where k is unknown.
arXiv Detail & Related papers (2025-05-17T08:41:07Z)
Multi-Swap $k$-Means++ [30.967186562175893]
The $k$-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is often the practitioners' choice algorithm for optimizing the popular $k$-means clustering objective. Lattanzi and Sohler (ICML) proposed augmenting $k$-means++ with $O(k log log k)$ local search steps to yield a $c$-approximation to the $k$-means clustering problem.
arXiv Detail & Related papers (2023-09-28T12:31:35Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
Sketch-and-solve approaches to k-means clustering by semidefinite programming [14.930208990741132]
We introduce a sketch-and-solve approach to speed up the Peng-Wei semidefinite relaxation of k-means clustering. If the data is appropriately separated we identify the k-means optimal clustering. Otherwise, our approach provides a high-confidence lower bound on the optimal k-means value.
arXiv Detail & Related papers (2022-11-28T19:51:30Z)
Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method. We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z)
Solving correlation clustering with QAOA and a Rydberg qudit system: a full-stack approach [94.37521840642141]
We study the correlation clustering problem using the quantum approximate optimization algorithm (QAOA) and qudits. Specifically, we consider a neutral atom quantum computer and propose a full stack approach for correlation clustering. We show the qudit implementation is superior to the qubit encoding as quantified by the gate count.
arXiv Detail & Related papers (2021-06-22T11:07:38Z)
Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem. We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z)
Improved Guarantees for k-means++ and k-means++ Parallel [18.26254785549146]
k-means++ and k-means++ parallel are popular algorithms for the classic k-means clustering problem. We show improved approximation and bi-criteria approximation guarantees for k-means++ and k-means++ parallel. We propose a new variant of k-means++ parallel algorithm (Exponential Race k-means++) that has the same approximation guarantees as k-means++.
arXiv Detail & Related papers (2020-10-27T17:46:00Z)
Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems. Our results imply an improved algorithm for the Sample and Aggregate privacy framework. One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z)
Breathing K-Means: Superior K-Means Solutions through Dynamic K-Values [0.0]
We introduce the breathing k-means algorithm, which on average significantly improves solutions obtained by the widely-known greedy k-means++ algorithm. The improvements are achieved through a novel breathing'' technique, that cyclically increases and decreases the number of centroids based on local error and utility measures.
arXiv Detail & Related papers (2020-06-28T17:49:37Z)
k-sums: another side of k-means [34.317086730703124]
In this paper, the decades-old clustering method k-means is revisited. The original distortion minimization model of k-means is addressed by a pure minimization procedure. A new target function that minimizes pairwise distances within clusters is presented.
arXiv Detail & Related papers (2020-05-19T14:36:12Z)
Ball k-means [53.89505717006118]
The Ball k-means algorithm uses a ball to describe a cluster, focusing on reducing the point-centroid distance computation. The fast speed, no extra parameters and simple design of the Ball k-means make it an all-around replacement of the naive k-means algorithm.
arXiv Detail & Related papers (2020-05-02T10:39:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.