Related papers: Breathing K-Means: Superior K-Means Solutions through Dynamic K-Values

Breathing K-Means: Superior K-Means Solutions through Dynamic K-Values

URL: http://arxiv.org/abs/2006.15666v5
Date: Sun, 18 Aug 2024 14:14:19 GMT
Title: Breathing K-Means: Superior K-Means Solutions through Dynamic K-Values
Authors: Bernd Fritzke,
Abstract summary: We introduce the breathing k-means algorithm, which on average significantly improves solutions obtained by the widely-known greedy k-means++ algorithm. The improvements are achieved through a novel breathing'' technique, that cyclically increases and decreases the number of centroids based on local error and utility measures.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce the breathing k-means algorithm, which on average significantly improves solutions obtained by the widely-known greedy k-means++ algorithm, the default method for k-means clustering in the scikit-learn package. The improvements are achieved through a novel ``breathing'' technique, that cyclically increases and decreases the number of centroids based on local error and utility measures. We conducted experiments using greedy k-means++ as a baseline, comparing it with breathing k-means and five other k-means algorithms. Among the methods investigated, only breathing k-means and better k-means++ consistently outperformed the baseline, with breathing k-means demonstrating a substantial lead. This superior performance was maintained even when comparing the best result of ten runs for all other algorithms to a single run of breathing k-means, highlighting its effectiveness and speed. Our findings indicate that the breathing k-means algorithm outperforms the other k-means techniques, especially greedy k-means++ with ten repetitions, which it dominates in both solution quality and speed. This positions breathing k-means (with the built-in initialization by a single run of greedy k-means++) as a superior alternative to running greedy k-means++ on its own.

Related papers

K*-Means: A Parameter-free Clustering Algorithm [55.20132267309382]
k*-means is a novel clustering algorithm that eliminates the need to set k or any other parameters.<n>It uses the minimum description length principle to automatically determine the optimal number of clusters, k*, by splitting and merging clusters.<n>We prove that k*-means is guaranteed to converge and demonstrate experimentally that it significantly outperforms existing methods in scenarios where k is unknown.
arXiv Detail & Related papers (2025-05-17T08:41:07Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
CKmeans and FCKmeans : Two deterministic initialization procedures for Kmeans algorithm using a modified crowding distance [0.0]
Two novel deterministic procedures for K-means clustering are presented. The procedures, named CKmeans and FCKmeans, use more crowded points as initial centroids. Experimental studies on multiple datasets demonstrate that the proposed approach outperforms Kmeans and Kmeans++ in terms of clustering accuracy.
arXiv Detail & Related papers (2023-04-19T21:46:02Z)
Efficient Approximate Kernel Based Spike Sequence Classification [56.2938724367661]
Machine learning models, such as SVM, require a definition of distance/similarity between pairs of sequences. Exact methods yield better classification performance, but they pose high computational costs. We propose a series of ways to improve the performance of the approximate kernel in order to enhance its predictive performance.
arXiv Detail & Related papers (2022-09-11T22:44:19Z)
Careful Seeding for k-Medois Clustering with Incremental k-Means++ Initialization [17.4921582710817]
K-medoids clustering is a popular variant of k-means clustering and widely used in pattern recognition and machine learning. An improved k-medoids clustering algorithm, called INCKM algorithm, was recently proposed to overcome this drawback. We propose a novel k-medoids clustering algorithm, called incremental k-means++ (INCKPP) algorithm, which initializes with a novel incremental manner.
arXiv Detail & Related papers (2022-07-06T02:25:35Z)
K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect the Number of Clusters [0.12313056815753944]
This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters. Accuracy and speed are two main advantages of the proposed method.
arXiv Detail & Related papers (2021-10-09T23:02:57Z)
Robust Trimmed k-means [70.88503833248159]
We propose Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points. We show RTKM performs competitively with other methods on single membership data with outliers and multi-membership data without outliers.
arXiv Detail & Related papers (2021-08-16T15:49:40Z)
Improved Guarantees for k-means++ and k-means++ Parallel [18.26254785549146]
k-means++ and k-means++ parallel are popular algorithms for the classic k-means clustering problem. We show improved approximation and bi-criteria approximation guarantees for k-means++ and k-means++ parallel. We propose a new variant of k-means++ parallel algorithm (Exponential Race k-means++) that has the same approximation guarantees as k-means++.
arXiv Detail & Related papers (2020-10-27T17:46:00Z)
Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems. Our results imply an improved algorithm for the Sample and Aggregate privacy framework. One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z)
SimpleMKKM: Simple Multiple Kernel K-means [49.500663154085586]
We propose a simple yet effective multiple kernel clustering algorithm, termed simple multiple kernel k-means (SimpleMKKM) Our criterion is given by an intractable minimization-maximization problem in the kernel coefficient and clustering partition matrix. We theoretically analyze the performance of SimpleMKKM in terms of its clustering generalization error.
arXiv Detail & Related papers (2020-05-11T10:06:40Z)
Improving The Performance Of The K-means Algorithm [2.28438857884398]
My thesis proposes two algorithms to speed up IKM while remaining the quality of its clustering result approximately. The first algorithm, called Divisive K-means, improves the speed of IKM by speeding up its splitting process of clusters. The second algorithm, called Parallel Two-Phase K-means (Par2PK-means), parallelizes IKM by employing the model of Two-Phase K-means.
arXiv Detail & Related papers (2020-05-10T15:09:44Z)
Ball k-means [53.89505717006118]
The Ball k-means algorithm uses a ball to describe a cluster, focusing on reducing the point-centroid distance computation. The fast speed, no extra parameters and simple design of the Ball k-means make it an all-around replacement of the naive k-means algorithm.
arXiv Detail & Related papers (2020-05-02T10:39:26Z)
k-means++: few more steps yield constant approximation [3.7468898363447654]
The k-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is a state-of-the-art algorithm for solving the k-means clustering problem. Recently, Lattanzi and Sohler (ICML) proposed augmenting k-means++ with O(k log k) local search steps to yield a constant approximation (in expectation) to the k-means clustering problem.
arXiv Detail & Related papers (2020-02-18T18:28:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.