Careful Seeding for k-Medois Clustering with Incremental k-Means++ Initialization
- URL: http://arxiv.org/abs/2207.02404v2
- Date: Wed, 18 Dec 2024 08:00:59 GMT
- Title: Careful Seeding for k-Medois Clustering with Incremental k-Means++ Initialization
- Authors: Difei Cheng, Yunfeng Zhang, Ruinan Jin,
- Abstract summary: K-medoids clustering is a popular variant of k-means clustering and widely used in pattern recognition and machine learning.
An improved k-medoids clustering algorithm, called INCKM algorithm, was recently proposed to overcome this drawback.
We propose a novel k-medoids clustering algorithm, called incremental k-means++ (INCKPP) algorithm, which initializes with a novel incremental manner.
- Score: 17.4921582710817
- License:
- Abstract: K-medoids clustering is a popular variant of k-means clustering and widely used in pattern recognition and machine learning. A main drawback of k-medoids clustering is that an improper initialization can cause it to get trapped in local optima. An improved k-medoids clustering algorithm, called INCKM algorithm, which is the first to apply incremental initialization to k-medoids clustering, was recently proposed to overcome this drawback. The INCKM algorithm requires the construction of a subset of candidate medoids determined by one hyperparameter for initialization, and meanwhile, it always fails when dealing with imbalanced datasets with an incorrect hyperparameter selection. In this paper, we propose a novel k-medoids clustering algorithm, called incremental k-means++ (INCKPP) algorithm, which initializes with a novel incremental manner, attempting to optimally add one new cluster center at each stage through a nonparametric and stochastic k-means++ initialization. The INCKPP algorithm overcomes the difficulty of hyperparameter selection in the INCKM algorithm, improves the clustering performance, and can deal with imbalanced datasets well. However, the INCKPP algorithm is not computationally efficient enough. To deal with this, we further propose an improved INCKPP algorithm, called INCKPPsample algorithm, which improves the clustering efficiency while maintaining the clustering performance of the INCKPP algorithm. Extensive results from experiments on both synthetic and real-world datasets, including imbalanced datasets, illustrate that the proposed algorithms outperforms than the other compared algorithms.
Related papers
- A Greedy Strategy for Graph Cut [95.2841574410968]
We propose a greedy strategy to solve the problem of Graph Cut, called GGC.
It starts from the state where each data sample is regarded as a cluster and dynamically merges the two clusters.
GGC has a nearly linear computational complexity with respect to the number of samples.
arXiv Detail & Related papers (2024-12-28T05:49:42Z) - A Fresh Look at Generalized Category Discovery through Non-negative Matrix Factorization [83.12938977698988]
Generalized Category Discovery (GCD) aims to classify both base and novel images using labeled base data.
Current approaches inadequately address the intrinsic optimization of the co-occurrence matrix $barA$ based on cosine similarity.
We propose a Non-Negative Generalized Category Discovery (NN-GCD) framework to address these deficiencies.
arXiv Detail & Related papers (2024-10-29T07:24:11Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis.
This paper proposes a novel Fuzzy textitK-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - An enhanced method of initial cluster center selection for K-means
algorithm [0.0]
We propose a novel approach to improve initial cluster selection for K-means algorithm.
The Convex Hull algorithm facilitates the computing of the first two centroids and the remaining ones are selected according to the distance from previously selected centers.
We obtained only 7.33%, 7.90%, and 0% clustering error in Iris, Letter, and Ruspini data respectively.
arXiv Detail & Related papers (2022-10-18T00:58:50Z) - k-MS: A novel clustering algorithm based on morphological reconstruction [0.0]
k-MS is faster than the CPU-parallel k-Means in worst case scenarios.
It is also faster than similar clusterization methods that are sensitive to density and shapes such as Mitosis and TRICLUST.
arXiv Detail & Related papers (2022-08-30T16:55:21Z) - An Exact Algorithm for Semi-supervised Minimum Sum-of-Squares Clustering [0.5801044612920815]
We present a new branch-and-bound algorithm for semi-supervised MSSC.
Background knowledge is incorporated as pairwise must-link and cannot-link constraints.
For the first time, the proposed global optimization algorithm efficiently manages to solve real-world instances up to 800 data points.
arXiv Detail & Related papers (2021-11-30T17:08:53Z) - Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms.
DPPs favor diversity of the center points within subsets.
We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z) - A Multi-disciplinary Ensemble Algorithm for Clustering Heterogeneous
Datasets [0.76146285961466]
We propose a new evolutionary clustering algorithm (ECAStar) based on social class ranking and meta-heuristic algorithms.
ECAStar is integrated with recombinational evolutionary operators, Levy flight optimisation, and some statistical techniques.
Experiments are conducted to evaluate the ECAStar against five conventional approaches.
arXiv Detail & Related papers (2021-01-01T07:20:50Z) - Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems.
Our results imply an improved algorithm for the Sample and Aggregate privacy framework.
One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.