The Area Under the ROC Curve as a Measure of Clustering Quality
- URL: http://arxiv.org/abs/2009.02400v2
- Date: Wed, 22 Dec 2021 21:02:42 GMT
- Title: The Area Under the ROC Curve as a Measure of Clustering Quality
- Authors: Pablo Andretta Jaskowiak, Ivan Gesteira Costa, Ricardo Jos\'e
Gabrielli Barreto Campello
- Abstract summary: Area Under the Curve for Clustering (AUCC) is an internal/relative measure of clustering quality.
AUCC is a linear transformation of the Gamma criterion from Baker and Hubert (1975).
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Area Under the the Receiver Operating Characteristics (ROC) Curve,
referred to as AUC, is a well-known performance measure in the supervised
learning domain. Due to its compelling features, it has been employed in a
number of studies to evaluate and compare the performance of different
classifiers. In this work, we explore AUC as a performance measure in the
unsupervised learning domain, more specifically, in the context of cluster
analysis. In particular, we elaborate on the use of AUC as an internal/relative
measure of clustering quality, which we refer to as Area Under the Curve for
Clustering (AUCC). We show that the AUCC of a given candidate clustering
solution has an expected value under a null model of random clustering
solutions, regardless of the size of the dataset and, more importantly,
regardless of the number or the (im)balance of clusters under evaluation. In
addition, we elaborate on the fact that, in the context of internal/relative
clustering validation as we consider, AUCC is actually a linear transformation
of the Gamma criterion from Baker and Hubert (1975), for which we also formally
derive a theoretical expected value for chance clusterings. We also discuss the
computational complexity of these criteria and show that, while an ordinary
implementation of Gamma can be computationally prohibitive and impractical for
most real applications of cluster analysis, its equivalence with AUCC actually
unveils a much more efficient algorithmic procedure. Our theoretical findings
are supported by experimental results. These results show that, in addition to
an effective and robust quantitative evaluation provided by AUCC, visual
inspection of the ROC curves themselves can be useful to further assess a
candidate clustering solution from a broader, qualitative perspective as well.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Adaptive Self-supervised Robust Clustering for Unstructured Data with Unknown Cluster Number [12.926206811876174]
We introduce a novel self-supervised deep clustering approach tailored for unstructured data, termed Adaptive Self-supervised Robust Clustering (ASRC)
ASRC adaptively learns the graph structure and edge weights to capture both local and global structural information.
ASRC even outperforms methods that rely on prior knowledge of the number of clusters, highlighting its effectiveness in addressing the challenges of clustering unstructured data.
arXiv Detail & Related papers (2024-07-29T15:51:09Z) - From A-to-Z Review of Clustering Validation Indices [4.08908337437878]
We review and evaluate the performance of internal and external clustering validation indices on the most common clustering algorithms.
We suggest a classification framework for examining the functionality of both internal and external clustering validation measures.
arXiv Detail & Related papers (2024-07-18T13:52:02Z) - Interpretable Clustering with the Distinguishability Criterion [0.4419843514606336]
We present a global criterion called the Distinguishability criterion to quantify the separability of identified clusters and validate inferred cluster configurations.
We propose a combined loss function-based computational framework that integrates the Distinguishability criterion with many commonly used clustering procedures.
We present these new algorithms as well as the results from comprehensive data analysis based on simulation studies and real data applications.
arXiv Detail & Related papers (2024-04-24T16:38:15Z) - GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure.
First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples.
Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z) - Clustering Validation with The Area Under Precision-Recall Curves [0.0]
Clustering Validation Index (CVI) allows for clustering validation in real application scenarios.
We show that these are not only appropriate as CVIs, but should also be preferred in the presence of cluster imbalance.
We perform a comprehensive evaluation of proposed and state-of-art CVIs on real and simulated data sets.
arXiv Detail & Related papers (2023-04-04T01:49:57Z) - Oracle-guided Contrastive Clustering [28.066047266687058]
Oracle-guided Contrastive Clustering(OCC) is proposed to cluster by interactively making pairwise same-cluster" queries to oracles with distinctive demands.
To the best of our knowledge, it is the first deep framework to perform personalized clustering.
arXiv Detail & Related papers (2022-11-01T12:05:12Z) - Using Representation Expressiveness and Learnability to Evaluate
Self-Supervised Learning Methods [61.49061000562676]
We introduce Cluster Learnability (CL) to assess learnability.
CL is measured in terms of the performance of a KNN trained to predict labels obtained by clustering the representations with K-means.
We find that CL better correlates with in-distribution model performance than other competing recent evaluation schemes.
arXiv Detail & Related papers (2022-06-02T19:05:13Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Towards Uncovering the Intrinsic Data Structures for Unsupervised Domain
Adaptation using Structurally Regularized Deep Clustering [119.88565565454378]
Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain.
We propose a hybrid model of Structurally Regularized Deep Clustering, which integrates the regularized discriminative clustering of target data with a generative one.
Our proposed H-SRDC outperforms all the existing methods under both the inductive and transductive settings.
arXiv Detail & Related papers (2020-12-08T08:52:00Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.