k-sums: another side of k-means
- URL: http://arxiv.org/abs/2005.09485v1
- Date: Tue, 19 May 2020 14:36:12 GMT
- Title: k-sums: another side of k-means
- Authors: Wan-Lei Zhao, Run-Qing Chen, Hui Ye and Chong-Wah Ngo
- Abstract summary: In this paper, the decades-old clustering method k-means is revisited.
The original distortion minimization model of k-means is addressed by a pure minimization procedure.
A new target function that minimizes pairwise distances within clusters is presented.
- Score: 34.317086730703124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, the decades-old clustering method k-means is revisited. The
original distortion minimization model of k-means is addressed by a pure
stochastic minimization procedure. In each step of the iteration, one sample is
tentatively reallocated from one cluster to another. It is moved to another
cluster as long as the reallocation allows the sample to be closer to the new
centroid. This optimization procedure converges faster to a better local
minimum over k-means and many of its variants. This fundamental modification
over the k-means loop leads to the redefinition of a family of k-means
variants. Moreover, a new target function that minimizes the summation of
pairwise distances within clusters is presented. We show that it could be
solved under the same stochastic optimization procedure. This minimization
procedure built upon two minimization models outperforms k-means and its
variants considerably with different settings and on different datasets.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - CKmeans and FCKmeans : Two deterministic initialization procedures for
Kmeans algorithm using a modified crowding distance [0.0]
Two novel deterministic procedures for K-means clustering are presented.
The procedures, named CKmeans and FCKmeans, use more crowded points as initial centroids.
Experimental studies on multiple datasets demonstrate that the proposed approach outperforms Kmeans and Kmeans++ in terms of clustering accuracy.
arXiv Detail & Related papers (2023-04-19T21:46:02Z) - Sketch-and-solve approaches to k-means clustering by semidefinite
programming [14.930208990741132]
We introduce a sketch-and-solve approach to speed up the Peng-Wei semidefinite relaxation of k-means clustering.
If the data is appropriately separated we identify the k-means optimal clustering.
Otherwise, our approach provides a high-confidence lower bound on the optimal k-means value.
arXiv Detail & Related papers (2022-11-28T19:51:30Z) - Hessian Averaging in Stochastic Newton Methods Achieves Superlinear
Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method.
We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z) - K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect
the Number of Clusters [0.12313056815753944]
This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters.
Accuracy and speed are two main advantages of the proposed method.
arXiv Detail & Related papers (2021-10-09T23:02:57Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Ball k-means [53.89505717006118]
The Ball k-means algorithm uses a ball to describe a cluster, focusing on reducing the point-centroid distance computation.
The fast speed, no extra parameters and simple design of the Ball k-means make it an all-around replacement of the naive k-means algorithm.
arXiv Detail & Related papers (2020-05-02T10:39:26Z) - k-means++: few more steps yield constant approximation [3.7468898363447654]
The k-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is a state-of-the-art algorithm for solving the k-means clustering problem.
Recently, Lattanzi and Sohler (ICML) proposed augmenting k-means++ with O(k log k) local search steps to yield a constant approximation (in expectation) to the k-means clustering problem.
arXiv Detail & Related papers (2020-02-18T18:28:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.