Related papers: Clustering Validation with The Area Under Precision-Recall Curves

Clustering Validation with The Area Under Precision-Recall Curves

URL: http://arxiv.org/abs/2304.01450v1
Date: Tue, 4 Apr 2023 01:49:57 GMT
Title: Clustering Validation with The Area Under Precision-Recall Curves
Authors: Pablo Andretta Jaskowiak and Ivan Gesteira Costa
Abstract summary: Clustering Validation Index (CVI) allows for clustering validation in real application scenarios. We show that these are not only appropriate as CVIs, but should also be preferred in the presence of cluster imbalance. We perform a comprehensive evaluation of proposed and state-of-art CVIs on real and simulated data sets.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Confusion matrices and derived metrics provide a comprehensive framework for the evaluation of model performance in machine learning. These are well-known and extensively employed in the supervised learning domain, particularly classification. Surprisingly, such a framework has not been fully explored in the context of clustering validation. Indeed, just recently such a gap has been bridged with the introduction of the Area Under the ROC Curve for Clustering (AUCC), an internal/relative Clustering Validation Index (CVI) that allows for clustering validation in real application scenarios. In this work we explore the Area Under Precision-Recall Curve (and related metrics) in the context of clustering validation. We show that these are not only appropriate as CVIs, but should also be preferred in the presence of cluster imbalance. We perform a comprehensive evaluation of proposed and state-of-art CVIs on real and simulated data sets. Our observations corroborate towards an unified validation framework for supervised and unsupervised learning, given that they are consistent with existing guidelines established for the evaluation of supervised learning models.

Related papers

Adversarial Fair Multi-View Clustering [7.650076926241037]
We propose an adversarial fair multi-view clustering (AFMVC) framework that integrates fairness learning into the representation learning process.<n>Our framework achieves superior fairness and competitive clustering performance compared to existing multi-view clustering and fairness-aware clustering methods.
arXiv Detail & Related papers (2025-08-06T04:07:08Z)
FACROC: a fairness measure for FAir Clustering through ROC curves [1.936466750657896]
We introduce a new visual-based fairness measure for fair clustering through ROC curves, namely FACROC. This fairness measure employs AUCC as a measure of clustering quality and then computes the difference in the corresponding ROC curves for each value of the protected attribute.
arXiv Detail & Related papers (2025-03-02T11:11:34Z)
Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z)
Adaptive Self-supervised Robust Clustering for Unstructured Data with Unknown Cluster Number [12.926206811876174]
We introduce a novel self-supervised deep clustering approach tailored for unstructured data, termed Adaptive Self-supervised Robust Clustering (ASRC) ASRC adaptively learns the graph structure and edge weights to capture both local and global structural information. ASRC even outperforms methods that rely on prior knowledge of the number of clusters, highlighting its effectiveness in addressing the challenges of clustering unstructured data.
arXiv Detail & Related papers (2024-07-29T15:51:09Z)
From A-to-Z Review of Clustering Validation Indices [4.08908337437878]
We review and evaluate the performance of internal and external clustering validation indices on the most common clustering algorithms. We suggest a classification framework for examining the functionality of both internal and external clustering validation measures.
arXiv Detail & Related papers (2024-07-18T13:52:02Z)
GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure. First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples. Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z)
Sanity Check for External Clustering Validation Benchmarks using Internal Validation Measures [8.808021343665319]
We address the lack of reliability in benchmarking clustering techniques based on labeled datasets. We propose a principled way to generate between-dataset internal measures that enable the comparison of CLM across datasets.
arXiv Detail & Related papers (2022-09-20T23:32:18Z)
Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods [61.49061000562676]
We introduce Cluster Learnability (CL) to assess learnability. CL is measured in terms of the performance of a KNN trained to predict labels obtained by clustering the representations with K-means. We find that CL better correlates with in-distribution model performance than other competing recent evaluation schemes.
arXiv Detail & Related papers (2022-06-02T19:05:13Z)
Deep Conditional Gaussian Mixture Model for Constrained Clustering [7.070883800886882]
Constrained clustering can leverage prior information on a growing amount of only partially labeled data. We propose a novel framework for constrained clustering that is intuitive, interpretable, and can be trained efficiently in the framework of gradient variational inference.
arXiv Detail & Related papers (2021-06-11T13:38:09Z)
You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation. We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one. By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z)
Towards Uncovering the Intrinsic Data Structures for Unsupervised Domain Adaptation using Structurally Regularized Deep Clustering [119.88565565454378]
Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain. We propose a hybrid model of Structurally Regularized Deep Clustering, which integrates the regularized discriminative clustering of target data with a generative one. Our proposed H-SRDC outperforms all the existing methods under both the inductive and transductive settings.
arXiv Detail & Related papers (2020-12-08T08:52:00Z)
The Area Under the ROC Curve as a Measure of Clustering Quality [0.0]
Area Under the Curve for Clustering (AUCC) is an internal/relative measure of clustering quality. AUCC is a linear transformation of the Gamma criterion from Baker and Hubert (1975).
arXiv Detail & Related papers (2020-09-04T21:34:51Z)
Structured Graph Learning for Clustering and Semi-supervised Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data. Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure. Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.