Wide Gaps and Clustering Axioms
- URL: http://arxiv.org/abs/2308.03464v1
- Date: Mon, 7 Aug 2023 10:43:48 GMT
- Title: Wide Gaps and Clustering Axioms
- Authors: Mieczys{\l}aw A. K{\l}opotek
- Abstract summary: k-means is in conflict with Kleinberg's axiomatic system for distance based clustering algorithms.
We introduce two new clusterability properties, variational k-separability and residual k-separability.
We reconcile k-means with Kleinberg's axiomatic framework in Euclidean and non-Euclidean settings.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The widely applied k-means algorithm produces clusterings that violate our
expectations with respect to high/low similarity/density and is in conflict
with Kleinberg's axiomatic system for distance based clustering algorithms that
formalizes those expectations in a natural way. k-means violates in particular
the consistency axiom. We hypothesise that this clash is due to the not
explicated expectation that the data themselves should have the property of
being clusterable in order to expect the algorithm clustering hem to fit a
clustering axiomatic system. To demonstrate this, we introduce two new
clusterability properties, variational k-separability and residual
k-separability and show that then the Kleinberg's consistency axiom holds for
k-means operating in the Euclidean or non-Euclidean space. Furthermore, we
propose extensions of k-means algorithm that fit approximately the Kleinberg's
richness axiom that does not hold for k-means. In this way, we reconcile
k-means with Kleinberg's axiomatic framework in Euclidean and non-Euclidean
settings. Besides contribution to the theory of axiomatic frameworks of
clustering and for clusterability theory, practical contribution is the
possibility to construct {datasets for testing purposes of algorithms
optimizing k-means cost function. This includes a method of construction of
{clusterable data with known in advance global optimum.
Related papers
- Fuzzy K-Means Clustering without Cluster Centroids [79.19713746387337]
Fuzzy K-Means clustering is a critical computation technique in unsupervised data analysis.
This paper proposes a novel Fuzzy K-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z) - Explaining Kernel Clustering via Decision Trees [10.504801686625129]
We investigate interpretable kernel clustering, and propose algorithms that construct decision trees to approximate partitions induced by kernel k-means.
We build on previous work on explainable k-means and demonstrate how a suitable choice of features allows preserving interpretability without sacrificing approximation guarantees on the interpretable model.
arXiv Detail & Related papers (2024-02-15T11:08:23Z) - Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks.
Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z) - A Greedy and Optimistic Approach to Clustering with a Specified
Uncertainty of Covariates [6.231304401179968]
We propose a greedy and optimistic clustering (GOC) algorithm that finds better feature candidates over empirical uncertainty sets.
As an important application, we apply the GOC algorithm to synthetic datasets of the orbital properties of stars generated through our numerical simulation mimicking the formation process of the Milky Way.
arXiv Detail & Related papers (2022-04-18T07:54:24Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - HAWKS: Evolving Challenging Benchmark Sets for Cluster Analysis [2.5329716878122404]
Comprehensive benchmarking of clustering algorithms is difficult.
There is no consensus regarding the best practice for rigorous benchmarking.
We demonstrate the important role evolutionary algorithms play to support flexible generation of such benchmarks.
arXiv Detail & Related papers (2021-02-13T15:01:34Z) - A Multi-disciplinary Ensemble Algorithm for Clustering Heterogeneous
Datasets [0.76146285961466]
We propose a new evolutionary clustering algorithm (ECAStar) based on social class ranking and meta-heuristic algorithms.
ECAStar is integrated with recombinational evolutionary operators, Levy flight optimisation, and some statistical techniques.
Experiments are conducted to evaluate the ECAStar against five conventional approaches.
arXiv Detail & Related papers (2021-01-01T07:20:50Z) - Efficient Methods for Structured Nonconvex-Nonconcave Min-Max
Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point.
The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Structured Graph Learning for Clustering and Semi-supervised
Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data.
Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure.
Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z) - Real Elliptically Skewed Distributions and Their Application to Robust
Cluster Analysis [5.137336092866906]
This article proposes a new class of Really Skewed (RESK) distributions and associated clustering algorithms.
Non-symmetrically distributed and heavy-tailed data clusters have been reported in a variety of real-world applications.
arXiv Detail & Related papers (2020-06-30T10:44:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.