Decomposing the Jaccard Distance and the Jaccard Index in ABCDE
- URL: http://arxiv.org/abs/2409.18522v1
- Date: Fri, 27 Sep 2024 08:00:32 GMT
- Title: Decomposing the Jaccard Distance and the Jaccard Index in ABCDE
- Authors: Stephan van Staden,
- Abstract summary: This paper decomposes the JaccardDistance and the JaccardIndex further.
In each case, the decomposition yields Impact and Quality metrics.
The new metrics are mathematically well-behaved and they are interrelated via simple equations.
- Score: 0.0
- License:
- Abstract: ABCDE is a sophisticated technique for evaluating differences between very large clusterings. Its main metric that characterizes the magnitude of the difference between two clusterings is the JaccardDistance, which is a true distance metric in the space of all clusterings of a fixed set of (weighted) items. The JaccardIndex is the complementary metric that characterizes the similarity of two clusterings. Its relationship with the JaccardDistance is simple: JaccardDistance + JaccardIndex = 1. This paper decomposes the JaccardDistance and the JaccardIndex further. In each case, the decomposition yields Impact and Quality metrics. The Impact metrics measure aspects of the magnitude of the clustering diff, while Quality metrics use human judgements to measure how much the clustering diff improves the quality of the clustering. The decompositions of this paper offer more and deeper insight into a clustering change. They also unlock new techniques for debugging and exploring the nature of the clustering diff. The new metrics are mathematically well-behaved and they are interrelated via simple equations. While the work can be seen as an alternative formal framework for ABCDE, we prefer to view it as complementary. It certainly offers a different perspective on the magnitude and the quality of a clustering change, and users can use whatever they want from each approach to gain more insight into a change.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - More Clustering Quality Metrics for ABCDE [0.0]
ABCDE is a technique for evaluating clusterings of very large populations of items.
This paper introduces a new metric, called IQ, to characterize the degree to which the clustering diff translates into an improvement in the quality.
arXiv Detail & Related papers (2024-09-20T10:24:39Z) - ABCDE: Application-Based Cluster Diff Evals [49.1574468325115]
It aims to be practical: it allows items to have associated importance values that are application-specific, it is frugal in its use of human judgements when determining which clustering is better, and it can report metrics for arbitrary slices of items.
The approach to measuring the delta in the clustering quality is novel: instead of trying to construct an expensive ground truth up front and evaluating the each clustering with respect to that, ABCDE samples questions for judgement on the basis of the actual diffs between the clusterings.
arXiv Detail & Related papers (2024-07-31T08:29:35Z) - Pointwise Metrics for Clustering Evaluation [0.0]
This paper defines pointwise clustering metrics, a collection of metrics for characterizing the similarity of two clusterings.
The metric definitions are based on standard set-theoretic notions and are simple to understand.
It is possible to assign metrics to individual items, clusters, arbitrary slices of items, and the overall clustering.
arXiv Detail & Related papers (2024-05-16T19:49:35Z) - OMH: Structured Sparsity via Optimally Matched Hierarchy for Unsupervised Semantic Segmentation [69.37484603556307]
Un Semantic segmenting (USS) involves segmenting images without relying on predefined labels.
We introduce a novel approach called Optimally Matched Hierarchy (OMH) to simultaneously address the above issues.
Our OMH yields better unsupervised segmentation performance compared to existing USS methods.
arXiv Detail & Related papers (2024-03-11T09:46:41Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Normalised clustering accuracy: An asymmetric external cluster validity measure [2.900810893770134]
Clustering algorithms are traditionally evaluated using either internal or external validity measures.
In this paper, we argue that the commonly used classical partition similarity scores miss some desirable properties.
We propose and analyse a new measure: a version of the optimal set-matching accuracy.
arXiv Detail & Related papers (2022-09-07T05:08:34Z) - J-Score: A Robust Measure of Clustering Accuracy [8.33909555155795]
Clustering analysis discovers hidden structures in a data set by partitioning them into disjoint clusters.
Current clustering accuracy measures include overlooking unmatched clusters, biases towards excessive clusters, unstable baselines, and difficult interpretation.
We present a novel accuracy measure, J-score, that addresses these issues.
arXiv Detail & Related papers (2021-09-03T04:43:52Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - Graph Contrastive Clustering [131.67881457114316]
We propose a novel graph contrastive learning framework, which is then applied to the clustering task and we come up with the Graph Constrastive Clustering(GCC) method.
Specifically, on the one hand, the graph Laplacian based contrastive loss is proposed to learn more discriminative and clustering-friendly features.
On the other hand, a novel graph-based contrastive learning strategy is proposed to learn more compact clustering assignments.
arXiv Detail & Related papers (2021-04-03T15:32:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.