Related papers: A correlation-based fuzzy cluster validity index with secondary options detector

A correlation-based fuzzy cluster validity index with secondary options detector

URL: http://arxiv.org/abs/2308.14785v3
Date: Thu, 7 Mar 2024 06:20:39 GMT
Title: A correlation-based fuzzy cluster validity index with secondary options detector
Authors: Nathakhun Wiroonsri and Onthada Preedasawakul
Abstract summary: We introduce a correlation-based fuzzy cluster validity index known as the Wiroonsri-Preedasawakul (WP) index. We evaluate and compare the performance of our index with several existing indexes, including Xie-Beni, Pakhira-Bandyopadhyay-Maulik, Tang, Wu-Li, generalized C, and Kwon2.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The optimal number of clusters is one of the main concerns when applying cluster analysis. Several cluster validity indexes have been introduced to address this problem. However, in some situations, there is more than one option that can be chosen as the final number of clusters. This aspect has been overlooked by most of the existing works in this area. In this study, we introduce a correlation-based fuzzy cluster validity index known as the Wiroonsri-Preedasawakul (WP) index. This index is defined based on the correlation between the actual distance between a pair of data points and the distance between adjusted centroids with respect to that pair. We evaluate and compare the performance of our index with several existing indexes, including Xie-Beni, Pakhira-Bandyopadhyay-Maulik, Tang, Wu-Li, generalized C, and Kwon2. We conduct this evaluation on four types of datasets: artificial datasets, real-world datasets, simulated datasets with ranks, and image datasets, using the fuzzy c-means algorithm. Overall, the WP index outperforms most, if not all, of these indexes in terms of accurately detecting the optimal number of clusters and providing accurate secondary options. Moreover, our index remains effective even when the fuzziness parameter $m$ is set to a large value. Our R package called UniversalCVI used in this work is available at https://CRAN.R-project.org/package=UniversalCVI.

Related papers

ABCDE: Application-Based Cluster Diff Evals [49.1574468325115]
It aims to be practical: it allows items to have associated importance values that are application-specific, it is frugal in its use of human judgements when determining which clustering is better, and it can report metrics for arbitrary slices of items. The approach to measuring the delta in the clustering quality is novel: instead of trying to construct an expensive ground truth up front and evaluating the each clustering with respect to that, ABCDE samples questions for judgement on the basis of the actual diffs between the clusterings.
arXiv Detail & Related papers (2024-07-31T08:29:35Z)
MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence [97.93517982908007]
In cross-domain few-shot classification, NCC aims to learn representations to construct a metric space where few-shot classification can be performed. In this paper, we find that there exist high similarities between NCC-learned representations of two samples from different classes. We propose a bi-level optimization framework, emphmaximizing optimized kernel dependence (MOKD) to learn a set of class-specific representations that match the cluster structures indicated by labeled data.
arXiv Detail & Related papers (2024-05-29T05:59:52Z)
A Bayesian cluster validity index [0.0]
Cluster validity indices (CVIs) are designed to identify the optimal number of clusters within a dataset. We introduce a Bayesian cluster validity index (BCVI) which builds upon existing indices. Our BCVI offers clear advantages in situations where user expertise is valuable, allowing users to specify their desired range for the final number of clusters.
arXiv Detail & Related papers (2024-02-03T14:23:36Z)
Superclustering by finding statistically significant separable groups of optimal gaussian clusters [0.0]
The paper presents the algorithm for clustering a dataset by grouping the optimal, from the point of view of the BIC criterion. An essential advantage of the algorithm is its ability to predict correct supercluster for new data based on already trained clusterer.
arXiv Detail & Related papers (2023-09-05T23:49:46Z)
Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering. In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework. In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
Dink-Net: Neural Clustering on Large Graphs [59.10189693120368]
A deep graph clustering method (Dink-Net) is proposed with the idea of dilation and shrink. By discriminating nodes, whether being corrupted by augmentations, representations are learned in a self-supervised manner. The clustering distribution is optimized by minimizing the proposed cluster dilation loss and cluster shrink loss. Compared to the runner-up, Dink-Net 9.62% achieves NMI improvement on the ogbn-papers100M dataset with 111 million nodes and 1.6 billion edges.
arXiv Detail & Related papers (2023-05-28T15:33:24Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
A novel cluster internal evaluation index based on hyper-balls [11.048887848164268]
It is crucial to evaluate the quality and determine the optimal number of clusters in cluster analysis. In this paper, the multi-granularity characterization of the data set is carried out to obtain the hyper-balls. The cluster internal evaluation index based on hyper-balls(HCVI) is defined.
arXiv Detail & Related papers (2022-12-30T02:56:40Z)
K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect the Number of Clusters [0.12313056815753944]
This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters. Accuracy and speed are two main advantages of the proposed method.
arXiv Detail & Related papers (2021-10-09T23:02:57Z)
Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms. DPPs favor diversity of the center points within subsets. We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z)
A New Validity Index for Fuzzy-Possibilistic C-Means Clustering [6.174448419090291]
Fuzzy-Possibilistic (FP) index works well in the presence of clusters that vary in shape and density. FPCM requires a priori selection of the degree of fuzziness and the degree of typicality.
arXiv Detail & Related papers (2020-05-19T01:48:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.