Wide Gaps and Clustering Axioms
- URL: http://arxiv.org/abs/2308.03464v1
- Date: Mon, 7 Aug 2023 10:43:48 GMT
- Title: Wide Gaps and Clustering Axioms
- Authors: Mieczys{\l}aw A. K{\l}opotek
- Abstract summary: k-means is in conflict with Kleinberg's axiomatic system for distance based clustering algorithms.
We introduce two new clusterability properties, variational k-separability and residual k-separability.
We reconcile k-means with Kleinberg's axiomatic framework in Euclidean and non-Euclidean settings.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The widely applied k-means algorithm produces clusterings that violate our
expectations with respect to high/low similarity/density and is in conflict
with Kleinberg's axiomatic system for distance based clustering algorithms that
formalizes those expectations in a natural way. k-means violates in particular
the consistency axiom. We hypothesise that this clash is due to the not
explicated expectation that the data themselves should have the property of
being clusterable in order to expect the algorithm clustering hem to fit a
clustering axiomatic system. To demonstrate this, we introduce two new
clusterability properties, variational k-separability and residual
k-separability and show that then the Kleinberg's consistency axiom holds for
k-means operating in the Euclidean or non-Euclidean space. Furthermore, we
propose extensions of k-means algorithm that fit approximately the Kleinberg's
richness axiom that does not hold for k-means. In this way, we reconcile
k-means with Kleinberg's axiomatic framework in Euclidean and non-Euclidean
settings. Besides contribution to the theory of axiomatic frameworks of
clustering and for clusterability theory, practical contribution is the
possibility to construct {datasets for testing purposes of algorithms
optimizing k-means cost function. This includes a method of construction of
{clusterable data with known in advance global optimum.
Related papers
- Hierarchical and Density-based Causal Clustering [6.082022112101251]
We propose plug-in estimators that are simple and readily implementable using off-the-shelf algorithms.
We go on to study their rate of convergence, and show that the additional cost of causal clustering is essentially the estimation error of the outcome regression functions.
arXiv Detail & Related papers (2024-11-02T14:01:04Z) - A Fresh Look at Generalized Category Discovery through Non-negative Matrix Factorization [83.12938977698988]
Generalized Category Discovery (GCD) aims to classify both base and novel images using labeled base data.
Current approaches inadequately address the intrinsic optimization of the co-occurrence matrix $barA$ based on cosine similarity.
We propose a Non-Negative Generalized Category Discovery (NN-GCD) framework to address these deficiencies.
arXiv Detail & Related papers (2024-10-29T07:24:11Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Explaining Kernel Clustering via Decision Trees [10.504801686625129]
We investigate interpretable kernel clustering, and propose algorithms that construct decision trees to approximate partitions induced by kernel k-means.
We build on previous work on explainable k-means and demonstrate how a suitable choice of features allows preserving interpretability without sacrificing approximation guarantees on the interpretable model.
arXiv Detail & Related papers (2024-02-15T11:08:23Z) - Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks.
Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z) - A Greedy and Optimistic Approach to Clustering with a Specified
Uncertainty of Covariates [6.231304401179968]
We propose a greedy and optimistic clustering (GOC) algorithm that finds better feature candidates over empirical uncertainty sets.
As an important application, we apply the GOC algorithm to synthetic datasets of the orbital properties of stars generated through our numerical simulation mimicking the formation process of the Milky Way.
arXiv Detail & Related papers (2022-04-18T07:54:24Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - A Multi-disciplinary Ensemble Algorithm for Clustering Heterogeneous
Datasets [0.76146285961466]
We propose a new evolutionary clustering algorithm (ECAStar) based on social class ranking and meta-heuristic algorithms.
ECAStar is integrated with recombinational evolutionary operators, Levy flight optimisation, and some statistical techniques.
Experiments are conducted to evaluate the ECAStar against five conventional approaches.
arXiv Detail & Related papers (2021-01-01T07:20:50Z) - Efficient Methods for Structured Nonconvex-Nonconcave Min-Max
Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point.
The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.