Stochastic mean-shift clustering
- URL: http://arxiv.org/abs/2312.15684v1
- Date: Mon, 25 Dec 2023 10:27:08 GMT
- Title: Stochastic mean-shift clustering
- Authors: Itshak Lapidot
- Abstract summary: Mean-shift clustering is compared with a standard (deterministic) mean-shift clustering.
It was found the the mean-shift clustering outperformed in most of the cases the deterministic mean-shift.
- Score: 2.844607682703337
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we presented a stochastic version mean-shift clustering
algorithm. In the stochastic version the data points "climb" to the modes of
the distribution collectively, while in the deterministic mean-shift, each
datum "climbs" individually, while all other data points remains in their
original coordinates. Stochastic version of the mean-shift clustering is
comparison with a standard (deterministic) mean-shift clustering on synthesized
2- and 3-dimensional data distributed between several Gaussian component. The
comparison performed in terms of cluster purity and class data purity. It was
found the the stochastic mean-shift clustering outperformed in most of the
cases the deterministic mean-shift.
Related papers
- Stochastic Mean-Shift Clustering [1.4299355089723902]
We present a version of the mean-shift clustering algorithm.<n>In this version a randomly chosen sequence of data points move according to partial ascent steps of the objective function.<n>It can be observed that in most cases the mean-shift clustering outperforms the standard mean-shift.
arXiv Detail & Related papers (2025-11-12T11:07:05Z) - Benign Overfitting and the Geometry of the Ridge Regression Solution in Binary Classification [75.01389991485098]
We show that ridge regression has qualitatively different behavior depending on the scale of the cluster mean vector.
In regimes where the scale is very large, the conditions that allow for benign overfitting turn out to be the same as those for the regression task.
arXiv Detail & Related papers (2025-03-11T01:45:42Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Algorithm-Agnostic Interpretations for Clustering [0.0]
We propose algorithm-agnostic interpretation methods to explain clustering outcomes in reduced dimensions.
The permutation feature importance for clustering represents a general framework based on shuffling feature values.
All methods can be used with any clustering algorithm able to reassign instances through soft or hard labels.
arXiv Detail & Related papers (2022-09-21T18:08:40Z) - Clustering by the Probability Distributions from Extreme Value Theory [32.496691290725764]
This paper generalizes k-means to model the distribution of clusters.
We use GPD to establish a probability model for each cluster.
We also introduce a naive baseline, dubbed as Generalized Extreme Value (GEV) k-means.
Notably, GEV k-means can also estimate cluster structure and thus perform reasonably well over classical k-means.
arXiv Detail & Related papers (2022-02-20T10:52:43Z) - Anomaly Clustering: Grouping Images into Coherent Clusters of Anomaly
Types [60.45942774425782]
We introduce anomaly clustering, whose goal is to group data into coherent clusters of anomaly types.
This is different from anomaly detection, whose goal is to divide anomalies from normal data.
We present a simple yet effective clustering framework using a patch-based pretrained deep embeddings and off-the-shelf clustering methods.
arXiv Detail & Related papers (2021-12-21T23:11:33Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms.
DPPs favor diversity of the center points within subsets.
We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z) - Predictive K-means with local models [0.028675177318965035]
Predictive clustering seeks to obtain the best of the two worlds.
We present two new algorithms using this technique and show on a variety of data sets that they are competitive for prediction performance.
arXiv Detail & Related papers (2020-12-16T10:49:36Z) - Kernel learning approaches for summarising and combining posterior
similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models.
A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z) - Statistical power for cluster analysis [0.0]
Cluster algorithms are increasingly popular in biomedical research.
We estimate power and accuracy for common analysis through simulation.
We recommend that researchers only apply cluster analysis when large subgroup separation is expected.
arXiv Detail & Related papers (2020-03-01T02:43:15Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.