Related papers: k-sums: another side of k-means

k-sums: another side of k-means

URL: http://arxiv.org/abs/2005.09485v1
Date: Tue, 19 May 2020 14:36:12 GMT
Title: k-sums: another side of k-means
Authors: Wan-Lei Zhao, Run-Qing Chen, Hui Ye and Chong-Wah Ngo
Abstract summary: In this paper, the decades-old clustering method k-means is revisited. The original distortion minimization model of k-means is addressed by a pure minimization procedure. A new target function that minimizes pairwise distances within clusters is presented.
Score: 34.317086730703124
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, the decades-old clustering method k-means is revisited. The original distortion minimization model of k-means is addressed by a pure stochastic minimization procedure. In each step of the iteration, one sample is tentatively reallocated from one cluster to another. It is moved to another cluster as long as the reallocation allows the sample to be closer to the new centroid. This optimization procedure converges faster to a better local minimum over k-means and many of its variants. This fundamental modification over the k-means loop leads to the redefinition of a family of k-means variants. Moreover, a new target function that minimizes the summation of pairwise distances within clusters is presented. We show that it could be solved under the same stochastic optimization procedure. This minimization procedure built upon two minimization models outperforms k-means and its variants considerably with different settings and on different datasets.

Related papers

Radius-Guided Post-Clustering for Shape-Aware, Scalable Refinement of k-Means Results [1.9580473532948401]
After standard k-means, each cluster center is assigned a radius (the distance to its assigned point), and clusters whose radii overlap are merged. This post-processing step loosens the requirement for exact k long as k is. The method can often reconstruct non-estimated shapes over meaningful merges.
arXiv Detail & Related papers (2025-04-28T22:30:53Z)
Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
CKmeans and FCKmeans : Two deterministic initialization procedures for Kmeans algorithm using a modified crowding distance [0.0]
Two novel deterministic procedures for K-means clustering are presented. The procedures, named CKmeans and FCKmeans, use more crowded points as initial centroids. Experimental studies on multiple datasets demonstrate that the proposed approach outperforms Kmeans and Kmeans++ in terms of clustering accuracy.
arXiv Detail & Related papers (2023-04-19T21:46:02Z)
Sketch-and-solve approaches to k-means clustering by semidefinite programming [14.930208990741132]
We introduce a sketch-and-solve approach to speed up the Peng-Wei semidefinite relaxation of k-means clustering. If the data is appropriately separated we identify the k-means optimal clustering. Otherwise, our approach provides a high-confidence lower bound on the optimal k-means value.
arXiv Detail & Related papers (2022-11-28T19:51:30Z)
Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method. We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z)
K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect the Number of Clusters [0.12313056815753944]
This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters. Accuracy and speed are two main advantages of the proposed method.
arXiv Detail & Related papers (2021-10-09T23:02:57Z)
Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA. We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
Ball k-means [53.89505717006118]
The Ball k-means algorithm uses a ball to describe a cluster, focusing on reducing the point-centroid distance computation. The fast speed, no extra parameters and simple design of the Ball k-means make it an all-around replacement of the naive k-means algorithm.
arXiv Detail & Related papers (2020-05-02T10:39:26Z)
k-means++: few more steps yield constant approximation [3.7468898363447654]
The k-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is a state-of-the-art algorithm for solving the k-means clustering problem. Recently, Lattanzi and Sohler (ICML) proposed augmenting k-means++ with O(k log k) local search steps to yield a constant approximation (in expectation) to the k-means clustering problem.
arXiv Detail & Related papers (2020-02-18T18:28:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.