A Bayesian cluster validity index
- URL: http://arxiv.org/abs/2402.02162v2
- Date: Wed, 14 Feb 2024 14:25:10 GMT
- Title: A Bayesian cluster validity index
- Authors: Nathakhun Wiroonsri and Onthada Preedasawakul
- Abstract summary: Cluster validity indices (CVIs) are designed to identify the optimal number of clusters within a dataset.
We introduce a Bayesian cluster validity index (BCVI) which builds upon existing indices.
Our BCVI offers clear advantages in situations where user expertise is valuable, allowing users to specify their desired range for the final number of clusters.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Selecting the appropriate number of clusters is a critical step in applying
clustering algorithms. To assist in this process, various cluster validity
indices (CVIs) have been developed. These indices are designed to identify the
optimal number of clusters within a dataset. However, users may not always seek
the absolute optimal number of clusters but rather a secondary option that
better aligns with their specific applications. This realization has led us to
introduce a Bayesian cluster validity index (BCVI), which builds upon existing
indices. The BCVI utilizes either Dirichlet or generalized Dirichlet priors,
resulting in the same posterior distribution. We evaluate our BCVI using the
Wiroonsri index for hard clustering and the Wiroonsri-Preedasawakul index for
soft clustering as underlying indices. We compare the performance of our
proposed BCVI with that of the original underlying indices and several other
existing CVIs, including Davies-Bouldin, Starczewski, Xie-Beni, and KWON2
indices. Our BCVI offers clear advantages in situations where user expertise is
valuable, allowing users to specify their desired range for the final number of
clusters. To illustrate this, we conduct experiments classified into three
different scenarios. Additionally, we showcase the practical applicability of
our approach through real-world datasets, such as MRI brain tumor images. These
tools will be published as a new R package 'BayesCVI'.
Related papers
- MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence [97.93517982908007]
In cross-domain few-shot classification, NCC aims to learn representations to construct a metric space where few-shot classification can be performed.
In this paper, we find that there exist high similarities between NCC-learned representations of two samples from different classes.
We propose a bi-level optimization framework, emphmaximizing optimized kernel dependence (MOKD) to learn a set of class-specific representations that match the cluster structures indicated by labeled data.
arXiv Detail & Related papers (2024-05-29T05:59:52Z) - Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task.
We propose a co-training-based framework that encourages clustering consistency.
Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z) - A correlation-based fuzzy cluster validity index with secondary options
detector [0.0]
We introduce a correlation-based fuzzy cluster validity index known as the Wiroonsri-Preedasawakul (WP) index.
We evaluate and compare the performance of our index with several existing indexes, including Xie-Beni, Pakhira-Bandyopadhyay-Maulik, Tang, Wu-Li, generalized C, and Kwon2.
arXiv Detail & Related papers (2023-08-28T16:40:34Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - A novel cluster internal evaluation index based on hyper-balls [11.048887848164268]
It is crucial to evaluate the quality and determine the optimal number of clusters in cluster analysis.
In this paper, the multi-granularity characterization of the data set is carried out to obtain the hyper-balls.
The cluster internal evaluation index based on hyper-balls(HCVI) is defined.
arXiv Detail & Related papers (2022-12-30T02:56:40Z) - Are Cluster Validity Measures (In)valid? [3.7491936479803054]
In this paper we consider what happens if we treat such indices as objective functions in unsupervised learning activities.
It turns out that many cluster (in)validity indices promote clusterings that match expert knowledge quite poorly.
We introduce a new, well-performing variant of the Dunn index that is built upon OWA operators and the near-neighbour graph.
arXiv Detail & Related papers (2022-08-02T06:08:48Z) - Implicit Sample Extension for Unsupervised Person Re-Identification [97.46045935897608]
Clustering sometimes mixes different true identities together or splits the same identity into two or more sub clusters.
We propose an Implicit Sample Extension (OurWholeMethod) method to generate what we call support samples around the cluster boundaries.
Experiments demonstrate that the proposed method is effective and achieves state-of-the-art performance for unsupervised person Re-ID.
arXiv Detail & Related papers (2022-04-14T11:41:48Z) - Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC)
In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning.
For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z) - Clustering performance analysis using new correlation based cluster
validity indices [0.0]
We develop two new cluster validity indices based on a correlation between an actual distance between a pair of data points and a centroid distance of clusters that the two points locate in.
Our proposed indices constantly yield several peaks at different numbers of clusters which overcome the weakness previously stated.
arXiv Detail & Related papers (2021-09-23T06:59:41Z) - An Internal Cluster Validity Index Using a Distance-based Separability
Measure [0.0]
There are no true class labels for clustering in typical unsupervised learning.
There is no universal CVI that can be used to measure all datasets.
We propose a novel CVI called Distance-based Separability Index (DSI)
arXiv Detail & Related papers (2020-09-02T20:20:29Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.