An Internal Cluster Validity Index Using a Distance-based Separability
Measure
- URL: http://arxiv.org/abs/2009.01328v2
- Date: Mon, 4 Jan 2021 21:22:03 GMT
- Title: An Internal Cluster Validity Index Using a Distance-based Separability
Measure
- Authors: Shuyue Guan, Murray Loew
- Abstract summary: There are no true class labels for clustering in typical unsupervised learning.
There is no universal CVI that can be used to measure all datasets.
We propose a novel CVI called Distance-based Separability Index (DSI)
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To evaluate clustering results is a significant part of cluster analysis.
There are no true class labels for clustering in typical unsupervised learning.
Thus, a number of internal evaluations, which use predicted labels and data,
have been created. They are also named internal cluster validity indices
(CVIs). Without true labels, to design an effective CVI is not simple because
it is similar to create a clustering method. And, to have more CVIs is crucial
because there is no universal CVI that can be used to measure all datasets, and
no specific method for selecting a proper CVI for clusters without true labels.
Therefore, to apply more CVIs to evaluate clustering results is necessary. In
this paper, we propose a novel CVI - called Distance-based Separability Index
(DSI), based on a data separability measure. We applied the DSI and eight other
internal CVIs including early studies from Dunn (1974) to most recent studies
CVDD (2019) as comparison. We used an external CVI as ground truth for
clustering results of five clustering algorithms on 12 real and 97 synthetic
datasets. Results show DSI is an effective, unique, and competitive CVI to
other compared CVIs. In addition, we summarized the general process to evaluate
CVIs and created a new method - rank difference - to compare the results of
CVIs.
Related papers
- MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence [97.93517982908007]
In cross-domain few-shot classification, NCC aims to learn representations to construct a metric space where few-shot classification can be performed.
In this paper, we find that there exist high similarities between NCC-learned representations of two samples from different classes.
We propose a bi-level optimization framework, emphmaximizing optimized kernel dependence (MOKD) to learn a set of class-specific representations that match the cluster structures indicated by labeled data.
arXiv Detail & Related papers (2024-05-29T05:59:52Z) - A Bayesian cluster validity index [0.0]
Cluster validity indices (CVIs) are designed to identify the optimal number of clusters within a dataset.
We introduce a Bayesian cluster validity index (BCVI) which builds upon existing indices.
Our BCVI offers clear advantages in situations where user expertise is valuable, allowing users to specify their desired range for the final number of clusters.
arXiv Detail & Related papers (2024-02-03T14:23:36Z) - CARL-G: Clustering-Accelerated Representation Learning on Graphs [18.763104937800215]
We propose a novel clustering-based framework for graph representation learning that uses a loss inspired by Cluster Validation Indices (CVIs)
CARL-G is adaptable to different clustering methods and CVIs, and we show that with the right choice of clustering method and CVI, CARL-G outperforms node classification baselines on 4/5 datasets with up to a 79x training speedup compared to the best-performing baseline.
arXiv Detail & Related papers (2023-06-12T08:14:42Z) - CLC: Cluster Assignment via Contrastive Representation Learning [9.631532215759256]
We propose Contrastive Learning-based Clustering (CLC), which uses contrastive learning to directly learn cluster assignment.
We achieve 53.4% accuracy on the full ImageNet dataset and outperform existing methods by large margins.
arXiv Detail & Related papers (2023-06-08T07:15:13Z) - Dynamic Conceptional Contrastive Learning for Generalized Category
Discovery [76.82327473338734]
Generalized category discovery (GCD) aims to automatically cluster partially labeled data.
Unlabeled data contain instances that are not only from known categories of the labeled data but also from novel categories.
One effective way for GCD is applying self-supervised learning to learn discriminate representation for unlabeled data.
We propose a Dynamic Conceptional Contrastive Learning framework, which can effectively improve clustering accuracy.
arXiv Detail & Related papers (2023-03-30T14:04:39Z) - Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC)
In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning.
For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z) - A Distance-based Separability Measure for Internal Cluster Validation [0.0]
Internal cluster validity indices (CVIs) are used to evaluate clustering results in unsupervised learning.
We propose Distance-based Separability Index (DSI) based on a data separability measure.
Results show DSI is an effective, unique, and competitive CVI to other compared CVIs.
arXiv Detail & Related papers (2021-06-17T20:19:50Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Contrastive Clustering [57.71729650297379]
We propose Contrastive Clustering (CC) which explicitly performs the instance- and cluster-level contrastive learning.
In particular, CC achieves an NMI of 0.705 (0.431) on the CIFAR-10 (CIFAR-100) dataset, which is an up to 19% (39%) performance improvement compared with the best baseline.
arXiv Detail & Related papers (2020-09-21T08:54:40Z) - iCVI-ARTMAP: Accelerating and improving clustering using adaptive
resonance theory predictive mapping and incremental cluster validity indices [1.160208922584163]
iCVI-ARTMAP uses incremental cluster validity indices (iCVIs) to perform unsupervised learning.
It can achieve running times up to two orders of magnitude shorter than when using batch CVI computations.
arXiv Detail & Related papers (2020-08-22T19:37:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.