Selecting the Number of Clusters $K$ with a Stability Trade-off: an
Internal Validation Criterion
- URL: http://arxiv.org/abs/2006.08530v3
- Date: Tue, 16 May 2023 20:28:01 GMT
- Title: Selecting the Number of Clusters $K$ with a Stability Trade-off: an
Internal Validation Criterion
- Authors: Alex Mourer, Florent Forest, Mustapha Lebbah, Hanane Azzag and
J\'er\^ome Lacaille
- Abstract summary: Clustering stability has emerged as a natural and model-agnostic principle.
We propose a new principle: a good clustering should be stable, and within each cluster, there should exist no stable partition.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model selection is a major challenge in non-parametric clustering. There is
no universally admitted way to evaluate clustering results for the obvious
reason that no ground truth is available. The difficulty to find a universal
evaluation criterion is a consequence of the ill-defined objective of
clustering. In this perspective, clustering stability has emerged as a natural
and model-agnostic principle: an algorithm should find stable structures in the
data. If data sets are repeatedly sampled from the same underlying
distribution, an algorithm should find similar partitions. However, stability
alone is not well-suited to determine the number of clusters. For instance, it
is unable to detect if the number of clusters is too small. We propose a new
principle: a good clustering should be stable, and within each cluster, there
should exist no stable partition. This principle leads to a novel clustering
validation criterion based on between-cluster and within-cluster stability,
overcoming limitations of previous stability-based methods. We empirically
demonstrate the effectiveness of our criterion to select the number of clusters
and compare it with existing methods. Code is available at
https://github.com/FlorentF9/skstab.
Related papers
- Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Normalised clustering accuracy: An asymmetric external cluster validity measure [2.900810893770134]
Clustering algorithms are traditionally evaluated using either internal or external validity measures.
In this paper, we argue that the commonly used classical partition similarity scores miss some desirable properties.
We propose and analyse a new measure: a version of the optimal set-matching accuracy.
arXiv Detail & Related papers (2022-09-07T05:08:34Z) - Individual Preference Stability for Clustering [24.301650646957246]
We propose a natural notion of individual preference (IP) stability for clustering.
Our notion asks that every data point, on average, is closer to the points in its own cluster than to the points in any other cluster.
arXiv Detail & Related papers (2022-07-07T22:01:01Z) - Anomaly Clustering: Grouping Images into Coherent Clusters of Anomaly
Types [60.45942774425782]
We introduce anomaly clustering, whose goal is to group data into coherent clusters of anomaly types.
This is different from anomaly detection, whose goal is to divide anomalies from normal data.
We present a simple yet effective clustering framework using a patch-based pretrained deep embeddings and off-the-shelf clustering methods.
arXiv Detail & Related papers (2021-12-21T23:11:33Z) - Distribution free optimality intervals for clustering [1.7513645771137178]
Given data $mathcalD$ and a partition $mathcalC$ of these data into $K$ clusters, when can we say that the clusters obtained are correct or meaningful for the data?
This paper introduces a paradigm in which a clustering $mathcalC$ is considered meaningful if it is good with respect to a loss function such as the K-means distortion, and stable, i.e. the only good clustering up to small perturbations.
arXiv Detail & Related papers (2021-07-30T06:13:56Z) - Sum-of-norms clustering does not separate nearby balls [49.1574468325115]
We show a continuous version of sum-of-norms clustering, where the dataset is replaced by a general measure.
We state and prove a local-global characterization of the clustering that seems to be new even in the case of discrete datapoints.
arXiv Detail & Related papers (2021-04-28T13:35:17Z) - K-expectiles clustering [0.0]
We propose a novel partitioning clustering algorithm based on expectiles.
We suggest two schemes: fixed $tau$ clustering, and adaptive $tau$ clustering.
arXiv Detail & Related papers (2021-03-16T21:14:56Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - reval: a Python package to determine best clustering solutions with
stability-based relative clustering validation [1.8129328638036126]
reval is a Python package that leverages stability-based relative clustering validation methods to determine best clustering solutions.
This work aims at developing a stability-based method that selects the best clustering solution as the one that replicates, via supervised learning, on unseen subsets of data.
arXiv Detail & Related papers (2020-08-27T10:36:56Z) - Selective Inference for Latent Block Models [50.83356836818667]
This study provides a selective inference method for latent block models.
We construct a statistical test on a set of row and column cluster memberships of a latent block model.
The proposed exact and approximated tests work effectively, compared to the naive test that did not take the selective bias into account.
arXiv Detail & Related papers (2020-05-27T10:44:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.