Improving The Performance Of The K-means Algorithm
- URL: http://arxiv.org/abs/2005.04689v1
- Date: Sun, 10 May 2020 15:09:44 GMT
- Title: Improving The Performance Of The K-means Algorithm
- Authors: Tien-Dung Nguyen
- Abstract summary: My thesis proposes two algorithms to speed up IKM while remaining the quality of its clustering result approximately.
The first algorithm, called Divisive K-means, improves the speed of IKM by speeding up its splitting process of clusters.
The second algorithm, called Parallel Two-Phase K-means (Par2PK-means), parallelizes IKM by employing the model of Two-Phase K-means.
- Score: 2.28438857884398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Incremental K-means (IKM), an improved version of K-means (KM), was
introduced to improve the clustering quality of KM significantly. However, the
speed of IKM is slower than KM. My thesis proposes two algorithms to speed up
IKM while remaining the quality of its clustering result approximately. The
first algorithm, called Divisive K-means, improves the speed of IKM by speeding
up its splitting process of clusters. Testing with UCI Machine Learning data
sets, the new algorithm achieves the empirically global optimum as IKM and has
lower complexity, $O(k*log_{2}k*n)$, than IKM, $O(k^{2}n)$. The second
algorithm, called Parallel Two-Phase K-means (Par2PK-means), parallelizes IKM
by employing the model of Two-Phase K-means. Testing with large data sets, this
algorithm attains a good speedup ratio, closing to the linearly speed-up ratio.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - IDKM: Memory Efficient Neural Network Quantization via Implicit,
Differentiable k-Means [20.13045791225961]
We propose an implicit, differentiable $k$-means algorithm (IDKM) which eliminates the major memory restriction of DKM.
We show that IDKM achieves comparable performance to DKM with less compute time and less memory.
We also use IDKM and IDKM-JFB to quantize a large neural network, Resnet18, on hardware where DKM cannot train at all.
arXiv Detail & Related papers (2023-12-12T22:02:57Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Careful seeding for the k-medoids algorithm with incremental k++ cluster
construction [4.981260380070016]
An improved k-medoids algorithm (INCKM) was recently proposed to overcome this drawback.
We propose a novel incremental k-medoids algorithm (INCKPP) which dynamically increases the number of clusters.
Our algorithm can overcome the parameter selection problem in the improved k-medoids algorithm, improve clustering performance, and deal with imbalanced datasets very well.
arXiv Detail & Related papers (2022-07-06T02:25:35Z) - K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect
the Number of Clusters [0.12313056815753944]
This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters.
Accuracy and speed are two main advantages of the proposed method.
arXiv Detail & Related papers (2021-10-09T23:02:57Z) - Robust Trimmed k-means [70.88503833248159]
We propose Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points.
We show RTKM performs competitively with other methods on single membership data with outliers and multi-membership data without outliers.
arXiv Detail & Related papers (2021-08-16T15:49:40Z) - Exact Acceleration of K-Means++ and K-Means$\|$ [22.66983713481359]
K-Means++ and K-Means$|$ have become de facto tools for selecting the initial seeds of K-means.
We develop specialized triangle inequality pruning strategies and a dynamic priority queue to show the first acceleration of K-Means++ and K-Means$|$.
arXiv Detail & Related papers (2021-05-06T20:22:55Z) - Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems.
Our results imply an improved algorithm for the Sample and Aggregate privacy framework.
One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z) - Breathing K-Means: Superior K-Means Solutions through Dynamic K-Values [0.0]
We introduce the breathing k-means algorithm, which on average significantly improves solutions obtained by the widely-known greedy k-means++ algorithm.
The improvements are achieved through a novel breathing'' technique, that cyclically increases and decreases the number of centroids based on local error and utility measures.
arXiv Detail & Related papers (2020-06-28T17:49:37Z) - SimpleMKKM: Simple Multiple Kernel K-means [49.500663154085586]
We propose a simple yet effective multiple kernel clustering algorithm, termed simple multiple kernel k-means (SimpleMKKM)
Our criterion is given by an intractable minimization-maximization problem in the kernel coefficient and clustering partition matrix.
We theoretically analyze the performance of SimpleMKKM in terms of its clustering generalization error.
arXiv Detail & Related papers (2020-05-11T10:06:40Z) - Ball k-means [53.89505717006118]
The Ball k-means algorithm uses a ball to describe a cluster, focusing on reducing the point-centroid distance computation.
The fast speed, no extra parameters and simple design of the Ball k-means make it an all-around replacement of the naive k-means algorithm.
arXiv Detail & Related papers (2020-05-02T10:39:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.