A Distance-based Separability Measure for Internal Cluster Validation
- URL: http://arxiv.org/abs/2106.09794v1
- Date: Thu, 17 Jun 2021 20:19:50 GMT
- Title: A Distance-based Separability Measure for Internal Cluster Validation
- Authors: Shuyue Guan, Murray Loew
- Abstract summary: Internal cluster validity indices (CVIs) are used to evaluate clustering results in unsupervised learning.
We propose Distance-based Separability Index (DSI) based on a data separability measure.
Results show DSI is an effective, unique, and competitive CVI to other compared CVIs.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To evaluate clustering results is a significant part of cluster analysis.
Since there are no true class labels for clustering in typical unsupervised
learning, many internal cluster validity indices (CVIs), which use predicted
labels and data, have been created. Without true labels, to design an effective
CVI is as difficult as to create a clustering method. And it is crucial to have
more CVIs because there are no universal CVIs that can be used to measure all
datasets and no specific methods of selecting a proper CVI for clusters without
true labels. Therefore, to apply a variety of CVIs to evaluate clustering
results is necessary. In this paper, we propose a novel internal CVI -- the
Distance-based Separability Index (DSI), based on a data separability measure.
We compared the DSI with eight internal CVIs including studies from early Dunn
(1974) to most recent CVDD (2019) and an external CVI as ground truth, by using
clustering results of five clustering algorithms on 12 real and 97 synthetic
datasets. Results show DSI is an effective, unique, and competitive CVI to
other compared CVIs. We also summarized the general process to evaluate CVIs
and created the rank-difference metric for comparison of CVIs' results.
Related papers
- MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence [97.93517982908007]
In cross-domain few-shot classification, NCC aims to learn representations to construct a metric space where few-shot classification can be performed.
In this paper, we find that there exist high similarities between NCC-learned representations of two samples from different classes.
We propose a bi-level optimization framework, emphmaximizing optimized kernel dependence (MOKD) to learn a set of class-specific representations that match the cluster structures indicated by labeled data.
arXiv Detail & Related papers (2024-05-29T05:59:52Z) - A Bayesian cluster validity index [0.0]
Cluster validity indices (CVIs) are designed to identify the optimal number of clusters within a dataset.
We introduce a Bayesian cluster validity index (BCVI) which builds upon existing indices.
Our BCVI offers clear advantages in situations where user expertise is valuable, allowing users to specify their desired range for the final number of clusters.
arXiv Detail & Related papers (2024-02-03T14:23:36Z) - CLC: Cluster Assignment via Contrastive Representation Learning [9.631532215759256]
We propose Contrastive Learning-based Clustering (CLC), which uses contrastive learning to directly learn cluster assignment.
We achieve 53.4% accuracy on the full ImageNet dataset and outperform existing methods by large margins.
arXiv Detail & Related papers (2023-06-08T07:15:13Z) - Dynamic Conceptional Contrastive Learning for Generalized Category
Discovery [76.82327473338734]
Generalized category discovery (GCD) aims to automatically cluster partially labeled data.
Unlabeled data contain instances that are not only from known categories of the labeled data but also from novel categories.
One effective way for GCD is applying self-supervised learning to learn discriminate representation for unlabeled data.
We propose a Dynamic Conceptional Contrastive Learning framework, which can effectively improve clustering accuracy.
arXiv Detail & Related papers (2023-03-30T14:04:39Z) - Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC)
In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning.
For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Contrastive Clustering [57.71729650297379]
We propose Contrastive Clustering (CC) which explicitly performs the instance- and cluster-level contrastive learning.
In particular, CC achieves an NMI of 0.705 (0.431) on the CIFAR-10 (CIFAR-100) dataset, which is an up to 19% (39%) performance improvement compared with the best baseline.
arXiv Detail & Related papers (2020-09-21T08:54:40Z) - An Internal Cluster Validity Index Using a Distance-based Separability
Measure [0.0]
There are no true class labels for clustering in typical unsupervised learning.
There is no universal CVI that can be used to measure all datasets.
We propose a novel CVI called Distance-based Separability Index (DSI)
arXiv Detail & Related papers (2020-09-02T20:20:29Z) - iCVI-ARTMAP: Accelerating and improving clustering using adaptive
resonance theory predictive mapping and incremental cluster validity indices [1.160208922584163]
iCVI-ARTMAP uses incremental cluster validity indices (iCVIs) to perform unsupervised learning.
It can achieve running times up to two orders of magnitude shorter than when using batch CVI computations.
arXiv Detail & Related papers (2020-08-22T19:37:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.