Careful seeding for the k-medoids algorithm with incremental k++ cluster
construction
- URL: http://arxiv.org/abs/2207.02404v1
- Date: Wed, 6 Jul 2022 02:25:35 GMT
- Title: Careful seeding for the k-medoids algorithm with incremental k++ cluster
construction
- Authors: Difei Cheng, Bo Zhang
- Abstract summary: An improved k-medoids algorithm (INCKM) was recently proposed to overcome this drawback.
We propose a novel incremental k-medoids algorithm (INCKPP) which dynamically increases the number of clusters.
Our algorithm can overcome the parameter selection problem in the improved k-medoids algorithm, improve clustering performance, and deal with imbalanced datasets very well.
- Score: 4.981260380070016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The k-medoids algorithm is a popular variant of the k-means algorithm and
widely used in pattern recognition and machine learning. A main drawback of the
k-medoids algorithm is that it can be trapped in local optima. An improved
k-medoids algorithm (INCKM) was recently proposed to overcome this drawback,
based on constructing a candidate medoids subset with a parameter choosing
procedure, but it may fail when dealing with imbalanced datasets. In this
paper, we propose a novel incremental k-medoids algorithm (INCKPP) which
dynamically increases the number of clusters from 2 to k through a
nonparametric and stochastic k-means++ search procedure. Our algorithm can
overcome the parameter selection problem in the improved k-medoids algorithm,
improve the clustering performance, and deal with imbalanced datasets very
well. But our algorithm has a weakness in computation efficiency. To address
this issue, we propose a fast INCKPP algorithm (called INCKPP$_{sample}$) which
preserves the computational efficiency of the simple and fast k-medoids
algorithm with an improved clustering performance. The proposed algorithm is
compared with three state-of-the-art algorithms: the improved k-medoids
algorithm (INCKM), the simple and fast k-medoids algorithm (FKM) and the
k-means++ algorithm (KPP). Extensive experiments on both synthetic and real
world datasets including imbalanced datasets illustrate the effectiveness of
the proposed algorithm.
Related papers
- A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Quantum-Based Feature Selection for Multi-classification Problem in
Complex Systems with Edge Computing [15.894122816099133]
A quantum-based feature selection algorithm for the multi-classification problem, namely, QReliefF, is proposed.
Our algorithm is superior in finding the nearest neighbor, reducing the complexity from O(M) to O(sqrt(M)).
arXiv Detail & Related papers (2023-10-01T03:57:13Z) - OptABC: an Optimal Hyperparameter Tuning Approach for Machine Learning
Algorithms [1.6114012813668934]
OptABC is proposed to help ABC algorithm in faster convergence toward a near-optimum solution.
OptABC integrates artificial bee colony algorithm, K-Means clustering, greedy algorithm, and opposition-based learning strategy.
Experimental results demonstrate the effectiveness of OptABC compared to existing approaches in the literature.
arXiv Detail & Related papers (2021-12-15T22:33:39Z) - Provably Faster Algorithms for Bilevel Optimization [54.83583213812667]
Bilevel optimization has been widely applied in many important machine learning applications.
We propose two new algorithms for bilevel optimization.
We show that both algorithms achieve the complexity of $mathcalO(epsilon-1.5)$, which outperforms all existing algorithms by the order of magnitude.
arXiv Detail & Related papers (2021-06-08T21:05:30Z) - Benchmarking Simulation-Based Inference [5.3898004059026325]
Recent advances in probabilistic modelling have led to a large number of simulation-based inference algorithms which do not require numerical evaluation of likelihoods.
We provide a benchmark with inference tasks and suitable performance metrics, with an initial selection of algorithms.
We found that the choice of performance metric is critical, that even state-of-the-art algorithms have substantial room for improvement, and that sequential estimation improves sample efficiency.
arXiv Detail & Related papers (2021-01-12T18:31:22Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems.
Our results imply an improved algorithm for the Sample and Aggregate privacy framework.
One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z) - Relational Algorithms for k-means Clustering [17.552485682328772]
This paper gives a k-means approximation algorithm that is efficient in the relational algorithms model.
The running time is potentially exponentially smaller than $N$, the number of data points to be clustered that the relational database represents.
arXiv Detail & Related papers (2020-08-01T23:21:40Z) - Active Model Estimation in Markov Decision Processes [108.46146218973189]
We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP)
We show that our Markov-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime.
arXiv Detail & Related papers (2020-03-06T16:17:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.