An enhanced method of initial cluster center selection for K-means
algorithm
- URL: http://arxiv.org/abs/2210.09507v1
- Date: Tue, 18 Oct 2022 00:58:50 GMT
- Title: An enhanced method of initial cluster center selection for K-means
algorithm
- Authors: Zillur Rahman, Md. Sabir Hossain, Mohammad Hasan, Ahmed Imteaj
- Abstract summary: We propose a novel approach to improve initial cluster selection for K-means algorithm.
The Convex Hull algorithm facilitates the computing of the first two centroids and the remaining ones are selected according to the distance from previously selected centers.
We obtained only 7.33%, 7.90%, and 0% clustering error in Iris, Letter, and Ruspini data respectively.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clustering is one of the widely used techniques to find out patterns from a
dataset that can be applied in different applications or analyses. K-means, the
most popular and simple clustering algorithm, might get trapped into local
minima if not properly initialized and the initialization of this algorithm is
done randomly. In this paper, we propose a novel approach to improve initial
cluster selection for K-means algorithm. This algorithm is based on the fact
that the initial centroids must be well separated from each other since the
final clusters are separated groups in feature space. The Convex Hull algorithm
facilitates the computing of the first two centroids and the remaining ones are
selected according to the distance from previously selected centers. To ensure
the selection of one center per cluster, we use the nearest neighbor technique.
To check the robustness of our proposed algorithm, we consider several
real-world datasets. We obtained only 7.33%, 7.90%, and 0% clustering error in
Iris, Letter, and Ruspini data respectively which proves better performance
than other existing systems. The results indicate that our proposed method
outperforms the conventional K means approach by accelerating the computation
when the number of clusters is greater than 2.
Related papers
- Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis.
This paper proposes a novel Fuzzy textitK-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - ck-means, a novel unsupervised learning method that combines fuzzy and
crispy clustering methods to extract intersecting data [1.827510863075184]
This paper proposes a method to cluster data that share the same intersections between two features or more.
The main idea of this novel method is to generate fuzzy clusters of data using a Fuzzy C-Means (FCM) algorithm.
The algorithm is also able to find the optimal number of clusters for the FCM and the k-means algorithm, according to the consistency of the clusters given by the Silhouette Index (SI)
arXiv Detail & Related papers (2022-06-17T19:29:50Z) - Gradient Based Clustering [72.15857783681658]
We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality.
The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions.
arXiv Detail & Related papers (2022-02-01T19:31:15Z) - K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect
the Number of Clusters [0.12313056815753944]
This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters.
Accuracy and speed are two main advantages of the proposed method.
arXiv Detail & Related papers (2021-10-09T23:02:57Z) - A Constant Approximation Algorithm for Sequential No-Substitution
k-Median Clustering under a Random Arrival Order [24.304228393096395]
We study k-median clustering under the sequential no-substitution setting.
In this setting, a data stream is sequentially observed, and some of the points are selected by the algorithm as cluster centers.
We give a new algorithm for this setting that obtains a constant approximation factor on the optimal risk under a random arrival order.
arXiv Detail & Related papers (2021-02-08T08:25:29Z) - Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms.
DPPs favor diversity of the center points within subsets.
We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z) - Clustering of Big Data with Mixed Features [3.3504365823045044]
We develop a new clustering algorithm for large data of mixed type.
The algorithm is capable of detecting outliers and clusters of relatively lower density values.
We present experimental results to verify that our algorithm works well in practice.
arXiv Detail & Related papers (2020-11-11T19:54:38Z) - Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems.
Our results imply an improved algorithm for the Sample and Aggregate privacy framework.
One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z) - Ball k-means [53.89505717006118]
The Ball k-means algorithm uses a ball to describe a cluster, focusing on reducing the point-centroid distance computation.
The fast speed, no extra parameters and simple design of the Ball k-means make it an all-around replacement of the naive k-means algorithm.
arXiv Detail & Related papers (2020-05-02T10:39:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.