Global $k$-means$++$: an effective relaxation of the global $k$-means
clustering algorithm
- URL: http://arxiv.org/abs/2211.12271v3
- Date: Fri, 14 Jul 2023 11:39:36 GMT
- Title: Global $k$-means$++$: an effective relaxation of the global $k$-means
clustering algorithm
- Authors: Georgios Vardakas and Aristidis Likas
- Abstract summary: The $k$-means algorithm is a prevalent clustering method due to its simplicity, effectiveness, and speed.
We propose the emphglobal $k$-meanstexttt++ clustering algorithm, which is an effective way of acquiring quality clustering solutions.
- Score: 0.20305676256390928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The $k$-means algorithm is a prevalent clustering method due to its
simplicity, effectiveness, and speed. However, its main disadvantage is its
high sensitivity to the initial positions of the cluster centers. The global
$k$-means is a deterministic algorithm proposed to tackle the random
initialization problem of k-means but its well-known that requires high
computational cost. It partitions the data to $K$ clusters by solving all
$k$-means sub-problems incrementally for all $k=1,\ldots, K$. For each $k$
cluster problem, the method executes the $k$-means algorithm $N$ times, where
$N$ is the number of datapoints. In this paper, we propose the \emph{global
$k$-means\texttt{++}} clustering algorithm, which is an effective way of
acquiring quality clustering solutions akin to those of global $k$-means with a
reduced computational load. This is achieved by exploiting the center selection
probability that is effectively used in the $k$-means\texttt{++} algorithm. The
proposed method has been tested and compared in various benchmark datasets
yielding very satisfactory results in terms of clustering quality and execution
speed.
Related papers
- A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile
Streaming [48.18845814885398]
We develop new techniques to extend the applicability of sketching-based approaches to sparse dictionary learning and the Euclidean $k$-means clustering problems.
On the fast algorithms front, we obtain a new approach for designing PTAS's for the $k$-means clustering problem.
On the streaming algorithms front, we obtain new upper bounds and lower bounds for dictionary learning and $k$-means clustering.
arXiv Detail & Related papers (2023-10-29T16:46:26Z) - Simple, Scalable and Effective Clustering via One-Dimensional
Projections [10.807367640692021]
Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis.
We introduce a simple randomized clustering algorithm that provably runs in expected time $O(mathrmnnz(X) + nlog n)$ for arbitrary $k$.
We prove that our algorithm achieves approximation ratio $smashwidetildeO(k4)$ on any input dataset for the $k$-means objective.
arXiv Detail & Related papers (2023-10-25T16:37:45Z) - Do you know what q-means? [50.045011844765185]
Clustering is one of the most important tools for analysis of large datasets.
We present an improved version of the "$q$-means" algorithm for clustering.
We also present a "dequantized" algorithm for $varepsilon which runs in $Obig(frack2varepsilon2(sqrtkd + log(Nd))big.
arXiv Detail & Related papers (2023-08-18T17:52:12Z) - Are Easy Data Easy (for K-Means) [0.0]
This paper investigates the capability of correctly recovering well-separated clusters by various brands of the $k$-means algorithm.
A new algorithm is proposed that is a variation of $k$-means++ via repeated subsampling when choosing a seed.
arXiv Detail & Related papers (2023-08-02T09:40:19Z) - Replicable Clustering [57.19013971737493]
We propose algorithms for the statistical $k$-medians, statistical $k$-means, and statistical $k$-centers problems by utilizing approximation routines for their counterparts in a black-box manner.
We also provide experiments on synthetic distributions in 2D using the $k$-means++ implementation from sklearn as a black-box that validate our theoretical results.
arXiv Detail & Related papers (2023-02-20T23:29:43Z) - Improved Learning-augmented Algorithms for k-means and k-medians
Clustering [8.04779839951237]
We consider the problem of clustering in the learning-augmented setting, where we are given a data set in $d$-dimensional Euclidean space.
We propose a deterministic $k$-means algorithm that produces centers with improved bound on clustering cost.
Our algorithm works even when the predictions are not very accurate, i.e. our bound holds for $alpha$ up to $1/2$, an improvement over $alpha$ being at most $1/7$ in the previous work.
arXiv Detail & Related papers (2022-10-31T03:00:11Z) - Explainable k-means. Don't be greedy, plant bigger trees! [12.68470213641421]
We provide a new bi-criteria $tildeO(log2 k)$ competitive algorithm for explainable $k$-means clustering.
Explainable $k$-means was recently introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020)
arXiv Detail & Related papers (2021-11-04T23:15:17Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - Explainable $k$-Means and $k$-Medians Clustering [25.513261099927163]
We consider using a small decision tree to partition a data set into clusters, so that clusters can be characterized in a straightforward manner.
We show that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost.
We design an efficient algorithm that produces explainable clusters using a tree with $k$ leaves.
arXiv Detail & Related papers (2020-02-28T04:21:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.