Clustering Validation with The Area Under Precision-Recall Curves
- URL: http://arxiv.org/abs/2304.01450v1
- Date: Tue, 4 Apr 2023 01:49:57 GMT
- Title: Clustering Validation with The Area Under Precision-Recall Curves
- Authors: Pablo Andretta Jaskowiak and Ivan Gesteira Costa
- Abstract summary: Clustering Validation Index (CVI) allows for clustering validation in real application scenarios.
We show that these are not only appropriate as CVIs, but should also be preferred in the presence of cluster imbalance.
We perform a comprehensive evaluation of proposed and state-of-art CVIs on real and simulated data sets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Confusion matrices and derived metrics provide a comprehensive framework for
the evaluation of model performance in machine learning. These are well-known
and extensively employed in the supervised learning domain, particularly
classification. Surprisingly, such a framework has not been fully explored in
the context of clustering validation. Indeed, just recently such a gap has been
bridged with the introduction of the Area Under the ROC Curve for Clustering
(AUCC), an internal/relative Clustering Validation Index (CVI) that allows for
clustering validation in real application scenarios. In this work we explore
the Area Under Precision-Recall Curve (and related metrics) in the context of
clustering validation. We show that these are not only appropriate as CVIs, but
should also be preferred in the presence of cluster imbalance. We perform a
comprehensive evaluation of proposed and state-of-art CVIs on real and
simulated data sets. Our observations corroborate towards an unified validation
framework for supervised and unsupervised learning, given that they are
consistent with existing guidelines established for the evaluation of
supervised learning models.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Adaptive Self-supervised Robust Clustering for Unstructured Data with Unknown Cluster Number [12.926206811876174]
We introduce a novel self-supervised deep clustering approach tailored for unstructured data, termed Adaptive Self-supervised Robust Clustering (ASRC)
ASRC adaptively learns the graph structure and edge weights to capture both local and global structural information.
ASRC even outperforms methods that rely on prior knowledge of the number of clusters, highlighting its effectiveness in addressing the challenges of clustering unstructured data.
arXiv Detail & Related papers (2024-07-29T15:51:09Z) - From A-to-Z Review of Clustering Validation Indices [4.08908337437878]
We review and evaluate the performance of internal and external clustering validation indices on the most common clustering algorithms.
We suggest a classification framework for examining the functionality of both internal and external clustering validation measures.
arXiv Detail & Related papers (2024-07-18T13:52:02Z) - GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure.
First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples.
Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z) - Sanity Check for External Clustering Validation Benchmarks using
Internal Validation Measures [8.808021343665319]
We address the lack of reliability in benchmarking clustering techniques based on labeled datasets.
We propose a principled way to generate between-dataset internal measures that enable the comparison of CLM across datasets.
arXiv Detail & Related papers (2022-09-20T23:32:18Z) - Using Representation Expressiveness and Learnability to Evaluate
Self-Supervised Learning Methods [61.49061000562676]
We introduce Cluster Learnability (CL) to assess learnability.
CL is measured in terms of the performance of a KNN trained to predict labels obtained by clustering the representations with K-means.
We find that CL better correlates with in-distribution model performance than other competing recent evaluation schemes.
arXiv Detail & Related papers (2022-06-02T19:05:13Z) - Deep Conditional Gaussian Mixture Model for Constrained Clustering [7.070883800886882]
Constrained clustering can leverage prior information on a growing amount of only partially labeled data.
We propose a novel framework for constrained clustering that is intuitive, interpretable, and can be trained efficiently in the framework of gradient variational inference.
arXiv Detail & Related papers (2021-06-11T13:38:09Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Towards Uncovering the Intrinsic Data Structures for Unsupervised Domain
Adaptation using Structurally Regularized Deep Clustering [119.88565565454378]
Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain.
We propose a hybrid model of Structurally Regularized Deep Clustering, which integrates the regularized discriminative clustering of target data with a generative one.
Our proposed H-SRDC outperforms all the existing methods under both the inductive and transductive settings.
arXiv Detail & Related papers (2020-12-08T08:52:00Z) - The Area Under the ROC Curve as a Measure of Clustering Quality [0.0]
Area Under the Curve for Clustering (AUCC) is an internal/relative measure of clustering quality.
AUCC is a linear transformation of the Gamma criterion from Baker and Hubert (1975).
arXiv Detail & Related papers (2020-09-04T21:34:51Z) - Structured Graph Learning for Clustering and Semi-supervised
Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data.
Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure.
Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.