Spatial Transformer K-Means
- URL: http://arxiv.org/abs/2202.07829v1
- Date: Wed, 16 Feb 2022 02:25:46 GMT
- Title: Spatial Transformer K-Means
- Authors: Romain Cosentino, Randall Balestriero, Yanis Bahroun, Anirvan
Sengupta, Richard Baraniuk, Behnaam Aazhang
- Abstract summary: Intricate data embeddings have been designed to push $K$-means performances.
We propose preserving the intrinsic data space and augment K-means with a similarity measure invariant to non-rigid transformations.
- Score: 16.775789494555017
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: K-means defines one of the most employed centroid-based clustering algorithms
with performances tied to the data's embedding. Intricate data embeddings have
been designed to push $K$-means performances at the cost of reduced theoretical
guarantees and interpretability of the results. Instead, we propose preserving
the intrinsic data space and augment K-means with a similarity measure
invariant to non-rigid transformations. This enables (i) the reduction of
intrinsic nuisances associated with the data, reducing the complexity of the
clustering task and increasing performances and producing state-of-the-art
results, (ii) clustering in the input space of the data, leading to a fully
interpretable clustering algorithm, and (iii) the benefit of convergence
guarantees.
Related papers
- CoHiRF: A Scalable and Interpretable Clustering Framework for High-Dimensional Data [0.30723404270319693]
We propose Consensus Hierarchical Random Feature (CoHiRF), a novel clustering method designed to address challenges effectively.
CoHiRF leverages random feature selection to mitigate noise and dimensionality effects, repeatedly applies K-Means clustering in reduced feature spaces, and combines results through a unanimous consensus criterion.
CoHiRF is computationally efficient with a running time comparable to K-Means, scalable to massive datasets, and exhibits robust performance against state-of-the-art methods such as SC-SRGF, HDBSCAN, and OPTICS.
arXiv Detail & Related papers (2025-02-01T09:38:44Z) - Autoencoded UMAP-Enhanced Clustering for Unsupervised Learning [49.1574468325115]
We propose a novel approach to unsupervised learning by constructing a non-linear embedding of the data into a low-dimensional space followed by any conventional clustering algorithm.
The embedding promotes clusterability of the data and is comprised of two mappings: the encoder of an autoencoder neural network and the output of UMAP algorithm.
When applied to MNIST data, AUEC significantly outperforms the state-of-the-art techniques in terms of clustering accuracy.
arXiv Detail & Related papers (2025-01-13T22:30:38Z) - K-Means Clustering With Incomplete Data with the Use of Mahalanobis Distances [0.0]
We develop a unified K-means algorithm that incorporates Mahalanobis distances, instead of the traditional Euclidean distances.
We demonstrate that our algorithm consistently outperforms both standalone imputation followed by K-means.
These results hold across both the IRIS dataset and randomly generated data with elliptical clusters.
arXiv Detail & Related papers (2024-10-31T00:05:09Z) - Boosting K-means for Big Data by Fusing Data Streaming with Global Optimization [0.3069335774032178]
K-means clustering is a cornerstone of data mining, but its efficiency deteriorates when confronted with massive datasets.
We propose a novel algorithm that leverages the Variable Neighborhood Search (VNS) metaheuristic to optimize K-means clustering for big data.
arXiv Detail & Related papers (2024-10-18T15:43:34Z) - Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata.
DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase.
Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Adaptively Robust and Sparse K-means Clustering [5.535948428518607]
This paper proposes adaptively robust and sparse K-means clustering (ARSK) to address these practical limitations of the standard K-means algorithm.
For robustness, we introduce a redundant error component for each observation, and this additional parameter is penalized using a group sparse penalty.
To accommodate the impact of high-dimensional noisy variables, the objective function is modified by incorporating weights and implementing a penalty to control the sparsity of the weight vector.
arXiv Detail & Related papers (2024-07-09T15:20:41Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Effective Data-aware Covariance Estimator from Compressed Data [63.16042585506435]
We propose a data-aware weighted sampling based covariance matrix estimator, namely DACE, which can provide an unbiased covariance matrix estimation.
We conduct extensive experiments on both synthetic and real-world datasets to demonstrate the superior performance of our DACE.
arXiv Detail & Related papers (2020-10-10T10:10:28Z) - New advances in enumerative biclustering algorithms with online
partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.