ABCDE: Application-Based Cluster Diff Evals
- URL: http://arxiv.org/abs/2407.21430v1
- Date: Wed, 31 Jul 2024 08:29:35 GMT
- Title: ABCDE: Application-Based Cluster Diff Evals
- Authors: Stephan van Staden, Alexander Grubb,
- Abstract summary: It aims to be practical: it allows items to have associated importance values that are application-specific, it is frugal in its use of human judgements when determining which clustering is better, and it can report metrics for arbitrary slices of items.
The approach to measuring the delta in the clustering quality is novel: instead of trying to construct an expensive ground truth up front and evaluating the each clustering with respect to that, ABCDE samples questions for judgement on the basis of the actual diffs between the clusterings.
- Score: 49.1574468325115
- License:
- Abstract: This paper considers the problem of evaluating clusterings of very large populations of items. Given two clusterings, namely a Baseline clustering and an Experiment clustering, the tasks are twofold: 1) characterize their differences, and 2) determine which clustering is better. ABCDE is a novel evaluation technique for accomplishing that. It aims to be practical: it allows items to have associated importance values that are application-specific, it is frugal in its use of human judgements when determining which clustering is better, and it can report metrics for arbitrary slices of items, thereby facilitating understanding and debugging. The approach to measuring the delta in the clustering quality is novel: instead of trying to construct an expensive ground truth up front and evaluating the each clustering with respect to that, where the ground truth must effectively pre-anticipate clustering changes, ABCDE samples questions for judgement on the basis of the actual diffs between the clusterings. ABCDE builds upon the pointwise metrics for clustering evaluation, which make the ABCDE metrics intuitive and simple to understand. The mathematical elegance of the pointwise metrics equip ABCDE with rigorous yet practical ways to explore the clustering diffs and to estimate the quality delta.
Related papers
- Evaluation of Cluster Id Assignment Schemes with ABCDE [0.0]
A cluster id assignment scheme labels each cluster of a clustering with a distinct id.
Semantic id stability allows the users of a clustering to refer to a concept's cluster with an id that is stable across clusterings/time.
This paper treats the problem of evaluating the relative merits of id assignment schemes.
arXiv Detail & Related papers (2024-09-26T19:56:56Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - More Clustering Quality Metrics for ABCDE [0.0]
ABCDE is a technique for evaluating clusterings of very large populations of items.
This paper introduces a new metric, called IQ, to characterize the degree to which the clustering diff translates into an improvement in the quality.
arXiv Detail & Related papers (2024-09-20T10:24:39Z) - Pointwise Metrics for Clustering Evaluation [0.0]
This paper defines pointwise clustering metrics, a collection of metrics for characterizing the similarity of two clusterings.
The metric definitions are based on standard set-theoretic notions and are simple to understand.
It is possible to assign metrics to individual items, clusters, arbitrary slices of items, and the overall clustering.
arXiv Detail & Related papers (2024-05-16T19:49:35Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC)
In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning.
For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z) - J-Score: A Robust Measure of Clustering Accuracy [8.33909555155795]
Clustering analysis discovers hidden structures in a data set by partitioning them into disjoint clusters.
Current clustering accuracy measures include overlooking unmatched clusters, biases towards excessive clusters, unstable baselines, and difficult interpretation.
We present a novel accuracy measure, J-score, that addresses these issues.
arXiv Detail & Related papers (2021-09-03T04:43:52Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.