Are Cluster Validity Measures (In)valid?
- URL: http://arxiv.org/abs/2208.01261v1
- Date: Tue, 2 Aug 2022 06:08:48 GMT
- Title: Are Cluster Validity Measures (In)valid?
- Authors: Marek Gagolewski and Maciej Bartoszuk and Anna Cena
- Abstract summary: In this paper we consider what happens if we treat such indices as objective functions in unsupervised learning activities.
It turns out that many cluster (in)validity indices promote clusterings that match expert knowledge quite poorly.
We introduce a new, well-performing variant of the Dunn index that is built upon OWA operators and the near-neighbour graph.
- Score: 3.7491936479803054
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Internal cluster validity measures (such as the Calinski-Harabasz, Dunn, or
Davies-Bouldin indices) are frequently used for selecting the appropriate
number of partitions a dataset should be split into. In this paper we consider
what happens if we treat such indices as objective functions in unsupervised
learning activities. Is the optimal grouping with regards to, say, the
Silhouette index really meaningful? It turns out that many cluster (in)validity
indices promote clusterings that match expert knowledge quite poorly. We also
introduce a new, well-performing variant of the Dunn index that is built upon
OWA operators and the near-neighbour graph so that subspaces of higher density,
regardless of their shapes, can be separated from each other better.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - A Bayesian cluster validity index [0.0]
Cluster validity indices (CVIs) are designed to identify the optimal number of clusters within a dataset.
We introduce a Bayesian cluster validity index (BCVI) which builds upon existing indices.
Our BCVI offers clear advantages in situations where user expertise is valuable, allowing users to specify their desired range for the final number of clusters.
arXiv Detail & Related papers (2024-02-03T14:23:36Z) - A correlation-based fuzzy cluster validity index with secondary options
detector [0.0]
We introduce a correlation-based fuzzy cluster validity index known as the Wiroonsri-Preedasawakul (WP) index.
We evaluate and compare the performance of our index with several existing indexes, including Xie-Beni, Pakhira-Bandyopadhyay-Maulik, Tang, Wu-Li, generalized C, and Kwon2.
arXiv Detail & Related papers (2023-08-28T16:40:34Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Dink-Net: Neural Clustering on Large Graphs [59.10189693120368]
A deep graph clustering method (Dink-Net) is proposed with the idea of dilation and shrink.
By discriminating nodes, whether being corrupted by augmentations, representations are learned in a self-supervised manner.
The clustering distribution is optimized by minimizing the proposed cluster dilation loss and cluster shrink loss.
Compared to the runner-up, Dink-Net 9.62% achieves NMI improvement on the ogbn-papers100M dataset with 111 million nodes and 1.6 billion edges.
arXiv Detail & Related papers (2023-05-28T15:33:24Z) - Oracle-guided Contrastive Clustering [28.066047266687058]
Oracle-guided Contrastive Clustering(OCC) is proposed to cluster by interactively making pairwise same-cluster" queries to oracles with distinctive demands.
To the best of our knowledge, it is the first deep framework to perform personalized clustering.
arXiv Detail & Related papers (2022-11-01T12:05:12Z) - Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC)
In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning.
For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z) - Clustering performance analysis using new correlation based cluster
validity indices [0.0]
We develop two new cluster validity indices based on a correlation between an actual distance between a pair of data points and a centroid distance of clusters that the two points locate in.
Our proposed indices constantly yield several peaks at different numbers of clusters which overcome the weakness previously stated.
arXiv Detail & Related papers (2021-09-23T06:59:41Z) - Selective Pseudo-label Clustering [42.19193184852487]
Deep neural networks (DNNs) offer a means of addressing the challenging task of clustering high-dimensional data.
We propose selective pseudo-label clustering, which uses only the most confident pseudo-labels for training theDNN.
New approach achieves a state-of-the-art performance on three popular image datasets.
arXiv Detail & Related papers (2021-07-22T13:56:53Z) - Learning to Cluster Faces via Confidence and Connectivity Estimation [136.5291151775236]
We propose a fully learnable clustering framework without requiring a large number of overlapped subgraphs.
Our method significantly improves clustering accuracy and thus performance of the recognition models trained on top, yet it is an order of magnitude more efficient than existing supervised methods.
arXiv Detail & Related papers (2020-04-01T13:39:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.