A correlation-based fuzzy cluster validity index with secondary options
detector
- URL: http://arxiv.org/abs/2308.14785v3
- Date: Thu, 7 Mar 2024 06:20:39 GMT
- Title: A correlation-based fuzzy cluster validity index with secondary options
detector
- Authors: Nathakhun Wiroonsri and Onthada Preedasawakul
- Abstract summary: We introduce a correlation-based fuzzy cluster validity index known as the Wiroonsri-Preedasawakul (WP) index.
We evaluate and compare the performance of our index with several existing indexes, including Xie-Beni, Pakhira-Bandyopadhyay-Maulik, Tang, Wu-Li, generalized C, and Kwon2.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The optimal number of clusters is one of the main concerns when applying
cluster analysis. Several cluster validity indexes have been introduced to
address this problem. However, in some situations, there is more than one
option that can be chosen as the final number of clusters. This aspect has been
overlooked by most of the existing works in this area. In this study, we
introduce a correlation-based fuzzy cluster validity index known as the
Wiroonsri-Preedasawakul (WP) index. This index is defined based on the
correlation between the actual distance between a pair of data points and the
distance between adjusted centroids with respect to that pair. We evaluate and
compare the performance of our index with several existing indexes, including
Xie-Beni, Pakhira-Bandyopadhyay-Maulik, Tang, Wu-Li, generalized C, and Kwon2.
We conduct this evaluation on four types of datasets: artificial datasets,
real-world datasets, simulated datasets with ranks, and image datasets, using
the fuzzy c-means algorithm. Overall, the WP index outperforms most, if not
all, of these indexes in terms of accurately detecting the optimal number of
clusters and providing accurate secondary options. Moreover, our index remains
effective even when the fuzziness parameter $m$ is set to a large value. Our R
package called UniversalCVI used in this work is available at
https://CRAN.R-project.org/package=UniversalCVI.
Related papers
- ABCDE: Application-Based Cluster Diff Evals [49.1574468325115]
It aims to be practical: it allows items to have associated importance values that are application-specific, it is frugal in its use of human judgements when determining which clustering is better, and it can report metrics for arbitrary slices of items.
The approach to measuring the delta in the clustering quality is novel: instead of trying to construct an expensive ground truth up front and evaluating the each clustering with respect to that, ABCDE samples questions for judgement on the basis of the actual diffs between the clusterings.
arXiv Detail & Related papers (2024-07-31T08:29:35Z) - MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence [97.93517982908007]
In cross-domain few-shot classification, NCC aims to learn representations to construct a metric space where few-shot classification can be performed.
In this paper, we find that there exist high similarities between NCC-learned representations of two samples from different classes.
We propose a bi-level optimization framework, emphmaximizing optimized kernel dependence (MOKD) to learn a set of class-specific representations that match the cluster structures indicated by labeled data.
arXiv Detail & Related papers (2024-05-29T05:59:52Z) - A Bayesian cluster validity index [0.0]
Cluster validity indices (CVIs) are designed to identify the optimal number of clusters within a dataset.
We introduce a Bayesian cluster validity index (BCVI) which builds upon existing indices.
Our BCVI offers clear advantages in situations where user expertise is valuable, allowing users to specify their desired range for the final number of clusters.
arXiv Detail & Related papers (2024-02-03T14:23:36Z) - Superclustering by finding statistically significant separable groups of
optimal gaussian clusters [0.0]
The paper presents the algorithm for clustering a dataset by grouping the optimal, from the point of view of the BIC criterion.
An essential advantage of the algorithm is its ability to predict correct supercluster for new data based on already trained clusterer.
arXiv Detail & Related papers (2023-09-05T23:49:46Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - A novel cluster internal evaluation index based on hyper-balls [11.048887848164268]
It is crucial to evaluate the quality and determine the optimal number of clusters in cluster analysis.
In this paper, the multi-granularity characterization of the data set is carried out to obtain the hyper-balls.
The cluster internal evaluation index based on hyper-balls(HCVI) is defined.
arXiv Detail & Related papers (2022-12-30T02:56:40Z) - K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect
the Number of Clusters [0.12313056815753944]
This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters.
Accuracy and speed are two main advantages of the proposed method.
arXiv Detail & Related papers (2021-10-09T23:02:57Z) - Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms.
DPPs favor diversity of the center points within subsets.
We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z) - A New Validity Index for Fuzzy-Possibilistic C-Means Clustering [6.174448419090291]
Fuzzy-Possibilistic (FP) index works well in the presence of clusters that vary in shape and density.
FPCM requires a priori selection of the degree of fuzziness and the degree of typicality.
arXiv Detail & Related papers (2020-05-19T01:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.