Solving clustering as ill-posed problem: experiments with K-Means
algorithm
- URL: http://arxiv.org/abs/2211.08302v1
- Date: Tue, 15 Nov 2022 17:01:42 GMT
- Title: Solving clustering as ill-posed problem: experiments with K-Means
algorithm
- Authors: Alberto Arturo Vergani
- Abstract summary: The clustering procedure based on K-Means algorithm is studied as an inverse problem.
The attempts to improve the quality of the clustering inverse problem drive to reduce the input data via Principal Component Analysis.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this contribution, the clustering procedure based on K-Means algorithm is
studied as an inverse problem, which is a special case of the illposed
problems. The attempts to improve the quality of the clustering inverse problem
drive to reduce the input data via Principal Component Analysis (PCA). Since
there exists a theorem by Ding and He that links the cardinality of the optimal
clusters found with K-Means and the cardinality of the selected informative PCA
components, the computational experiments tested the theorem between two
quantitative features selection methods: Kaiser criteria (based on imperative
decision) versus Wishart criteria (based on random matrix theory). The results
suggested that PCA reduction with features selection by Wishart criteria leads
to a low matrix condition number and satisfies the relation between clusters
and components predicts by the theorem. The data used for the computations are
from a neuroscientific repository: it regards healthy and young subjects that
performed a task-oriented functional Magnetic Resonance Imaging (fMRI)
paradigm.
Related papers
- Fuzzy K-Means Clustering without Cluster Centroids [79.19713746387337]
Fuzzy K-Means clustering is a critical computation technique in unsupervised data analysis.
This paper proposes a novel Fuzzy K-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z) - Superclustering by finding statistically significant separable groups of
optimal gaussian clusters [0.0]
The paper presents the algorithm for clustering a dataset by grouping the optimal, from the point of view of the BIC criterion.
An essential advantage of the algorithm is its ability to predict correct supercluster for new data based on already trained clusterer.
arXiv Detail & Related papers (2023-09-05T23:49:46Z) - Multi-kernel Correntropy-based Orientation Estimation of IMUs: Gradient
Descent Methods [3.8286082196845466]
Correntropy-based descent gradient (CGD) and correntropy-based decoupled orientation estimation (CDOE)
Traditional methods rely on the mean squared error (MSE) criterion, making them vulnerable to external acceleration and magnetic interference.
New algorithms demonstrate significantly lower computational complexity than Kalman filter-based approaches.
arXiv Detail & Related papers (2023-04-13T13:57:33Z) - Multi-Prototypes Convex Merging Based K-Means Clustering Algorithm [20.341309224377866]
Multi-prototypes convex merging based K-Means clustering algorithm (MCKM) is presented.
MCKM is an efficient and explainable clustering algorithm for escaping the undesirable local minima of K-Means problem without given k first.
arXiv Detail & Related papers (2023-02-14T13:57:33Z) - On the Global Solution of Soft k-Means [159.23423824953412]
This paper presents an algorithm to solve the Soft k-Means problem globally.
A new model, named Minimal Volume Soft kMeans (MVSkM), is proposed to address solutions non-uniqueness issue.
arXiv Detail & Related papers (2022-12-07T12:06:55Z) - Reinforcement Learning with Heterogeneous Data: Estimation and Inference [84.72174994749305]
We introduce the K-Heterogeneous Markov Decision Process (K-Hetero MDP) to address sequential decision problems with population heterogeneity.
We propose the Auto-Clustered Policy Evaluation (ACPE) for estimating the value of a given policy, and the Auto-Clustered Policy Iteration (ACPI) for estimating the optimal policy in a given policy class.
We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset.
arXiv Detail & Related papers (2022-01-31T20:58:47Z) - Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms.
The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm.
As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z) - A New Validity Index for Fuzzy-Possibilistic C-Means Clustering [6.174448419090291]
Fuzzy-Possibilistic (FP) index works well in the presence of clusters that vary in shape and density.
FPCM requires a priori selection of the degree of fuzziness and the degree of typicality.
arXiv Detail & Related papers (2020-05-19T01:48:13Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.