Careful seeding for the k-medoids algorithm with incremental k++ cluster
construction
- URL: http://arxiv.org/abs/2207.02404v1
- Date: Wed, 6 Jul 2022 02:25:35 GMT
- Title: Careful seeding for the k-medoids algorithm with incremental k++ cluster
construction
- Authors: Difei Cheng, Bo Zhang
- Abstract summary: An improved k-medoids algorithm (INCKM) was recently proposed to overcome this drawback.
We propose a novel incremental k-medoids algorithm (INCKPP) which dynamically increases the number of clusters.
Our algorithm can overcome the parameter selection problem in the improved k-medoids algorithm, improve clustering performance, and deal with imbalanced datasets very well.
- Score: 4.981260380070016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The k-medoids algorithm is a popular variant of the k-means algorithm and
widely used in pattern recognition and machine learning. A main drawback of the
k-medoids algorithm is that it can be trapped in local optima. An improved
k-medoids algorithm (INCKM) was recently proposed to overcome this drawback,
based on constructing a candidate medoids subset with a parameter choosing
procedure, but it may fail when dealing with imbalanced datasets. In this
paper, we propose a novel incremental k-medoids algorithm (INCKPP) which
dynamically increases the number of clusters from 2 to k through a
nonparametric and stochastic k-means++ search procedure. Our algorithm can
overcome the parameter selection problem in the improved k-medoids algorithm,
improve the clustering performance, and deal with imbalanced datasets very
well. But our algorithm has a weakness in computation efficiency. To address
this issue, we propose a fast INCKPP algorithm (called INCKPP$_{sample}$) which
preserves the computational efficiency of the simple and fast k-medoids
algorithm with an improved clustering performance. The proposed algorithm is
compared with three state-of-the-art algorithms: the improved k-medoids
algorithm (INCKM), the simple and fast k-medoids algorithm (FKM) and the
k-means++ algorithm (KPP). Extensive experiments on both synthetic and real
world datasets including imbalanced datasets illustrate the effectiveness of
the proposed algorithm.
Related papers
- A Greedy Strategy for Graph Cut [95.2841574410968]
We propose a greedy strategy to solve the problem of Graph Cut, called GGC.
It starts from the state where each data sample is regarded as a cluster and dynamically merges the two clusters.
GGC has a nearly linear computational complexity with respect to the number of samples.
arXiv Detail & Related papers (2024-12-28T05:49:42Z) - A Fresh Look at Generalized Category Discovery through Non-negative Matrix Factorization [83.12938977698988]
Generalized Category Discovery (GCD) aims to classify both base and novel images using labeled base data.
Current approaches inadequately address the intrinsic optimization of the co-occurrence matrix $barA$ based on cosine similarity.
We propose a Non-Negative Generalized Category Discovery (NN-GCD) framework to address these deficiencies.
arXiv Detail & Related papers (2024-10-29T07:24:11Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis.
This paper proposes a novel Fuzzy textitK-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - An enhanced method of initial cluster center selection for K-means
algorithm [0.0]
We propose a novel approach to improve initial cluster selection for K-means algorithm.
The Convex Hull algorithm facilitates the computing of the first two centroids and the remaining ones are selected according to the distance from previously selected centers.
We obtained only 7.33%, 7.90%, and 0% clustering error in Iris, Letter, and Ruspini data respectively.
arXiv Detail & Related papers (2022-10-18T00:58:50Z) - k-MS: A novel clustering algorithm based on morphological reconstruction [0.0]
k-MS is faster than the CPU-parallel k-Means in worst case scenarios.
It is also faster than similar clusterization methods that are sensitive to density and shapes such as Mitosis and TRICLUST.
arXiv Detail & Related papers (2022-08-30T16:55:21Z) - An Exact Algorithm for Semi-supervised Minimum Sum-of-Squares Clustering [0.5801044612920815]
We present a new branch-and-bound algorithm for semi-supervised MSSC.
Background knowledge is incorporated as pairwise must-link and cannot-link constraints.
For the first time, the proposed global optimization algorithm efficiently manages to solve real-world instances up to 800 data points.
arXiv Detail & Related papers (2021-11-30T17:08:53Z) - Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms.
DPPs favor diversity of the center points within subsets.
We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z) - A Multi-disciplinary Ensemble Algorithm for Clustering Heterogeneous
Datasets [0.76146285961466]
We propose a new evolutionary clustering algorithm (ECAStar) based on social class ranking and meta-heuristic algorithms.
ECAStar is integrated with recombinational evolutionary operators, Levy flight optimisation, and some statistical techniques.
Experiments are conducted to evaluate the ECAStar against five conventional approaches.
arXiv Detail & Related papers (2021-01-01T07:20:50Z) - Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems.
Our results imply an improved algorithm for the Sample and Aggregate privacy framework.
One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.