More Clustering Quality Metrics for ABCDE
- URL: http://arxiv.org/abs/2409.13376v1
- Date: Fri, 20 Sep 2024 10:24:39 GMT
- Title: More Clustering Quality Metrics for ABCDE
- Authors: Stephan van Staden,
- Abstract summary: ABCDE is a technique for evaluating clusterings of very large populations of items.
This paper introduces a new metric, called IQ, to characterize the degree to which the clustering diff translates into an improvement in the quality.
- Score: 0.0
- License:
- Abstract: ABCDE is a technique for evaluating clusterings of very large populations of items. Given two clusterings, namely a Baseline clustering and an Experiment clustering, ABCDE can characterize their differences with impact and quality metrics, and thus help to determine which clustering to prefer. We previously described the basic quality metrics of ABCDE, namely the GoodSplitRate, BadSplitRate, GoodMergeRate, BadMergeRate and DeltaPrecision, and how to estimate them on the basis of human judgements. This paper extends that treatment with more quality metrics. It describes a technique that aims to characterize the DeltaRecall of the clustering change. It introduces a new metric, called IQ, to characterize the degree to which the clustering diff translates into an improvement in the quality. Ideally, a large diff would improve the quality by a large amount. Finally, this paper mentions ways to characterize the absolute Precision and Recall of a single clustering with ABCDE.
Related papers
- KULCQ: An Unsupervised Keyword-based Utterance Level Clustering Quality Metric [0.5671051073036456]
Keywords-based Utterance Level Clustering Quality (KULCQ) is an unsupervised metric that leverages keyword analysis to evaluate clustering quality.
Our results show that KULCQ better captures semantic relationships in conversational data while maintaining consistency with geometric clustering principles.
arXiv Detail & Related papers (2024-11-15T00:21:02Z) - Decomposing the Jaccard Distance and the Jaccard Index in ABCDE [0.0]
This paper decomposes the JaccardDistance and the JaccardIndex further.
In each case, the decomposition yields Impact and Quality metrics.
The new metrics are mathematically well-behaved and they are interrelated via simple equations.
arXiv Detail & Related papers (2024-09-27T08:00:32Z) - Evaluation of Cluster Id Assignment Schemes with ABCDE [0.0]
A cluster id assignment scheme labels each cluster of a clustering with a distinct id.
Semantic id stability allows the users of a clustering to refer to a concept's cluster with an id that is stable across clusterings/time.
This paper treats the problem of evaluating the relative merits of id assignment schemes.
arXiv Detail & Related papers (2024-09-26T19:56:56Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - ABCDE: Application-Based Cluster Diff Evals [49.1574468325115]
It aims to be practical: it allows items to have associated importance values that are application-specific, it is frugal in its use of human judgements when determining which clustering is better, and it can report metrics for arbitrary slices of items.
The approach to measuring the delta in the clustering quality is novel: instead of trying to construct an expensive ground truth up front and evaluating the each clustering with respect to that, ABCDE samples questions for judgement on the basis of the actual diffs between the clusterings.
arXiv Detail & Related papers (2024-07-31T08:29:35Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.