A novel initialisation based on hospital-resident assignment for the
k-modes algorithm
- URL: http://arxiv.org/abs/2002.02701v1
- Date: Fri, 7 Feb 2020 10:20:49 GMT
- Title: A novel initialisation based on hospital-resident assignment for the
k-modes algorithm
- Authors: Henry Wilde, Vincent Knight, Jonathan Gillard
- Abstract summary: This paper presents a new way of selecting an initial solution for the k-modes algorithm.
It allows for a notion of mathematical fairness and a leverage of the data that the common initialisations from literature do not.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a new way of selecting an initial solution for the
k-modes algorithm that allows for a notion of mathematical fairness and a
leverage of the data that the common initialisations from literature do not.
The method, which utilises the Hospital-Resident Assignment Problem to find the
set of initial cluster centroids, is compared with the current initialisations
on both benchmark datasets and a body of newly generated artificial datasets.
Based on this analysis, the proposed method is shown to outperform the other
initialisations in the majority of cases, especially when the number of
clusters is optimised. In addition, we find that our method outperforms the
leading established method specifically for low-density data.
Related papers
- A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - $k$-Median Clustering via Metric Embedding: Towards Better
Initialization with Differential Privacy [31.963659189956367]
We develop a new HST scheme, for the $k$-median problem in the general metric space.
From the tree, we propose a novel and efficient search algorithm, for good initial centers.
Our approach can also be extended to the $k$-means problem.
arXiv Detail & Related papers (2022-06-26T14:58:36Z) - Gradient Based Clustering [72.15857783681658]
We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality.
The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions.
arXiv Detail & Related papers (2022-02-01T19:31:15Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Data-driven Weight Initialization with Sylvester Solvers [72.11163104763071]
We propose a data-driven scheme to initialize the parameters of a deep neural network.
We show that our proposed method is especially effective in few-shot and fine-tuning settings.
arXiv Detail & Related papers (2021-05-02T07:33:16Z) - A Deep Learning Object Detection Method for an Efficient Clusters
Initialization [6.365889364810239]
Clustering has been used in numerous applications such as banking customers profiling, document retrieval, image segmentation, and e-commerce recommendation engines.
Existing clustering techniques present significant limitations, from which is the dependability of their stability on the initialization parameters.
This paper proposes a solution that can provide near-optimal clustering parameters with low computational and resources overhead.
arXiv Detail & Related papers (2021-04-28T08:34:25Z) - Scalable Initialization Methods for Large-Scale Clustering [0.0]
Two new methods for K-means clustering are proposed.
The proposed methods are scalable and can be run in parallel.
Experiments show that the proposed methods compare favorably to the state-of-the-art.
arXiv Detail & Related papers (2020-07-23T11:29:53Z) - Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems.
We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms.
Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.