Related papers: A Distance-based Separability Measure for Internal Cluster Validation

A Distance-based Separability Measure for Internal Cluster Validation

URL: http://arxiv.org/abs/2106.09794v1
Date: Thu, 17 Jun 2021 20:19:50 GMT
Title: A Distance-based Separability Measure for Internal Cluster Validation
Authors: Shuyue Guan, Murray Loew
Abstract summary: Internal cluster validity indices (CVIs) are used to evaluate clustering results in unsupervised learning. We propose Distance-based Separability Index (DSI) based on a data separability measure. Results show DSI is an effective, unique, and competitive CVI to other compared CVIs.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To evaluate clustering results is a significant part of cluster analysis. Since there are no true class labels for clustering in typical unsupervised learning, many internal cluster validity indices (CVIs), which use predicted labels and data, have been created. Without true labels, to design an effective CVI is as difficult as to create a clustering method. And it is crucial to have more CVIs because there are no universal CVIs that can be used to measure all datasets and no specific methods of selecting a proper CVI for clusters without true labels. Therefore, to apply a variety of CVIs to evaluate clustering results is necessary. In this paper, we propose a novel internal CVI -- the Distance-based Separability Index (DSI), based on a data separability measure. We compared the DSI with eight internal CVIs including studies from early Dunn (1974) to most recent CVDD (2019) and an external CVI as ground truth, by using clustering results of five clustering algorithms on 12 real and 97 synthetic datasets. Results show DSI is an effective, unique, and competitive CVI to other compared CVIs. We also summarized the general process to evaluate CVIs and created the rank-difference metric for comparison of CVIs' results.

Related papers

MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence [97.93517982908007]
In cross-domain few-shot classification, NCC aims to learn representations to construct a metric space where few-shot classification can be performed. In this paper, we find that there exist high similarities between NCC-learned representations of two samples from different classes. We propose a bi-level optimization framework, emphmaximizing optimized kernel dependence (MOKD) to learn a set of class-specific representations that match the cluster structures indicated by labeled data.
arXiv Detail & Related papers (2024-05-29T05:59:52Z)
On the Use of Relative Validity Indices for Comparing Clustering Approaches [0.6990493129893111]
Relative Validity Indices are widely used for evaluating and optimising clustering outcomes. There is a growing trend in the literature to use RVIs when selecting a Similarity Paradigm (SP) for clustering. This study presents the first comprehensive investigation into the reliability of RVIs for SP selection.
arXiv Detail & Related papers (2024-04-16T07:39:54Z)
CLC: Cluster Assignment via Contrastive Representation Learning [9.631532215759256]
We propose Contrastive Learning-based Clustering (CLC), which uses contrastive learning to directly learn cluster assignment. We achieve 53.4% accuracy on the full ImageNet dataset and outperform existing methods by large margins.
arXiv Detail & Related papers (2023-06-08T07:15:13Z)
Dynamic Conceptional Contrastive Learning for Generalized Category Discovery [76.82327473338734]
Generalized category discovery (GCD) aims to automatically cluster partially labeled data. Unlabeled data contain instances that are not only from known categories of the labeled data but also from novel categories. One effective way for GCD is applying self-supervised learning to learn discriminate representation for unlabeled data. We propose a Dynamic Conceptional Contrastive Learning framework, which can effectively improve clustering accuracy.
arXiv Detail & Related papers (2023-03-30T14:04:39Z)
Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC) In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning. For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z)
No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data. We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model. Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z)
You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation. We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one. By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z)
Contrastive Clustering [57.71729650297379]
We propose Contrastive Clustering (CC) which explicitly performs the instance- and cluster-level contrastive learning. In particular, CC achieves an NMI of 0.705 (0.431) on the CIFAR-10 (CIFAR-100) dataset, which is an up to 19% (39%) performance improvement compared with the best baseline.
arXiv Detail & Related papers (2020-09-21T08:54:40Z)
An Internal Cluster Validity Index Using a Distance-based Separability Measure [0.0]
There are no true class labels for clustering in typical unsupervised learning. There is no universal CVI that can be used to measure all datasets. We propose a novel CVI called Distance-based Separability Index (DSI)
arXiv Detail & Related papers (2020-09-02T20:20:29Z)
iCVI-ARTMAP: Accelerating and improving clustering using adaptive resonance theory predictive mapping and incremental cluster validity indices [1.160208922584163]
iCVI-ARTMAP uses incremental cluster validity indices (iCVIs) to perform unsupervised learning. It can achieve running times up to two orders of magnitude shorter than when using batch CVI computations.
arXiv Detail & Related papers (2020-08-22T19:37:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.