Related papers: Wide Gaps and Clustering Axioms

Wide Gaps and Clustering Axioms

URL: http://arxiv.org/abs/2308.03464v1
Date: Mon, 7 Aug 2023 10:43:48 GMT
Title: Wide Gaps and Clustering Axioms
Authors: Mieczys{\l}aw A. K{\l}opotek
Abstract summary: k-means is in conflict with Kleinberg's axiomatic system for distance based clustering algorithms. We introduce two new clusterability properties, variational k-separability and residual k-separability. We reconcile k-means with Kleinberg's axiomatic framework in Euclidean and non-Euclidean settings.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The widely applied k-means algorithm produces clusterings that violate our expectations with respect to high/low similarity/density and is in conflict with Kleinberg's axiomatic system for distance based clustering algorithms that formalizes those expectations in a natural way. k-means violates in particular the consistency axiom. We hypothesise that this clash is due to the not explicated expectation that the data themselves should have the property of being clusterable in order to expect the algorithm clustering hem to fit a clustering axiomatic system. To demonstrate this, we introduce two new clusterability properties, variational k-separability and residual k-separability and show that then the Kleinberg's consistency axiom holds for k-means operating in the Euclidean or non-Euclidean space. Furthermore, we propose extensions of k-means algorithm that fit approximately the Kleinberg's richness axiom that does not hold for k-means. In this way, we reconcile k-means with Kleinberg's axiomatic framework in Euclidean and non-Euclidean settings. Besides contribution to the theory of axiomatic frameworks of clustering and for clusterability theory, practical contribution is the possibility to construct {datasets for testing purposes of algorithms optimizing k-means cost function. This includes a method of construction of {clusterable data with known in advance global optimum.

Related papers

Random Normed k-Means: A Paradigm-Shift in Clustering within Probabilistic Metric Spaces [0.7864304771129751]
We introduce the first k-means variant in the literature that operates within a probabilistic metric space. By adopting a probabilistic perspective, our method not only introduces a fresh paradigm but also establishes a rigorous theoretical framework. Our proposed random normed k-means (RNKM) algorithm exhibits a remarkable ability to identify nonlinearly separable structures.
arXiv Detail & Related papers (2025-04-04T20:48:43Z)
Learning Identifiable Structures Helps Avoid Bias in DNN-based Supervised Causal Learning [56.22841701016295]
Supervised Causal Learning (SCL) is an emerging paradigm in this field. Existing Deep Neural Network (DNN)-based methods commonly adopt the "Node-Edge approach"
arXiv Detail & Related papers (2025-02-15T19:10:35Z)
Counterfactual Explanations for k-means and Gaussian Clustering [1.8561812622368767]
We present a general definition for counterfactuals for model-based clustering that includes plausibility and feasibility constraints. Our approach takes as input the factual, the target cluster, a binary mask indicating actionable or immutable features and a plausibility factor specifying how far from the cluster boundary the counterfactual should be placed.
arXiv Detail & Related papers (2025-01-17T14:56:20Z)
Hierarchical and Density-based Causal Clustering [6.082022112101251]
We propose plug-in estimators that are simple and readily implementable using off-the-shelf algorithms. We go on to study their rate of convergence, and show that the additional cost of causal clustering is essentially the estimation error of the outcome regression functions.
arXiv Detail & Related papers (2024-11-02T14:01:04Z)
A Fresh Look at Generalized Category Discovery through Non-negative Matrix Factorization [83.12938977698988]
Generalized Category Discovery (GCD) aims to classify both base and novel images using labeled base data. Current approaches inadequately address the intrinsic optimization of the co-occurrence matrix $barA$ based on cosine similarity. We propose a Non-Negative Generalized Category Discovery (NN-GCD) framework to address these deficiencies.
arXiv Detail & Related papers (2024-10-29T07:24:11Z)
Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z)
Explaining Kernel Clustering via Decision Trees [10.504801686625129]
We investigate interpretable kernel clustering, and propose algorithms that construct decision trees to approximate partitions induced by kernel k-means. We build on previous work on explainable k-means and demonstrate how a suitable choice of features allows preserving interpretability without sacrificing approximation guarantees on the interpretable model.
arXiv Detail & Related papers (2024-02-15T11:08:23Z)
Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks. Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z)
A Greedy and Optimistic Approach to Clustering with a Specified Uncertainty of Covariates [6.231304401179968]
We propose a greedy and optimistic clustering (GOC) algorithm that finds better feature candidates over empirical uncertainty sets. As an important application, we apply the GOC algorithm to synthetic datasets of the orbital properties of stars generated through our numerical simulation mimicking the formation process of the Milky Way.
arXiv Detail & Related papers (2022-04-18T07:54:24Z)
Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning. Recent work has established lower bounds against the class of low-degree methods. We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z)
A Multi-disciplinary Ensemble Algorithm for Clustering Heterogeneous Datasets [0.76146285961466]
We propose a new evolutionary clustering algorithm (ECAStar) based on social class ranking and meta-heuristic algorithms. ECAStar is integrated with recombinational evolutionary operators, Levy flight optimisation, and some statistical techniques. Experiments are conducted to evaluate the ECAStar against five conventional approaches.
arXiv Detail & Related papers (2021-01-01T07:20:50Z)
Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point. The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.