J-Score: A Robust Measure of Clustering Accuracy
- URL: http://arxiv.org/abs/2109.01306v1
- Date: Fri, 3 Sep 2021 04:43:52 GMT
- Title: J-Score: A Robust Measure of Clustering Accuracy
- Authors: Navid Ahmadinejad, Li Liu
- Abstract summary: Clustering analysis discovers hidden structures in a data set by partitioning them into disjoint clusters.
Current clustering accuracy measures include overlooking unmatched clusters, biases towards excessive clusters, unstable baselines, and difficult interpretation.
We present a novel accuracy measure, J-score, that addresses these issues.
- Score: 8.33909555155795
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Background. Clustering analysis discovers hidden structures in a data set by
partitioning them into disjoint clusters. Robust accuracy measures that
evaluate the goodness of clustering results are critical for algorithm
development and model diagnosis. Common problems of current clustering accuracy
measures include overlooking unmatched clusters, biases towards excessive
clusters, unstable baselines, and difficult interpretation. In this study, we
presented a novel accuracy measure, J-score, that addresses these issues.
Methods. Given a data set with known class labels, J-score quantifies how
well the hypothetical clusters produced by clustering analysis recover the true
classes. It starts with bidirectional set matching to identify the
correspondence between true classes and hypothetical clusters based on Jaccard
index. It then computes two weighted sums of Jaccard indices measuring the
reconciliation from classes to clusters and vice versa. The final J-score is
the harmonic mean of the two weighted sums.
Results. Via simulation studies, we evaluated the performance of J-score and
compared with existing measures. Our results show that J-score is effective in
distinguishing partition structures that differ only by unmatched clusters,
rewarding correct inference of class numbers, addressing biases towards
excessive clusters, and having a relatively stable baseline. The simplicity of
its calculation makes the interpretation straightforward. It is a valuable tool
complementary to other accuracy measures. We released an R/jScore package
implementing the algorithm.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - A structured regression approach for evaluating model performance across intersectional subgroups [53.91682617836498]
Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups.
We introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups.
arXiv Detail & Related papers (2024-01-26T14:21:45Z) - Linear time Evidence Accumulation Clustering with KMeans [0.0]
This work describes a trick which mimic the behavior of average linkage clustering.
We found a way of computing efficiently the density of a partitioning, reducing the cost from a quadratic to linear complexity.
The k-means results are comparable to the best state of the art in terms of NMI while keeping the computational cost low.
arXiv Detail & Related papers (2023-11-15T14:12:59Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Algorithm-Agnostic Interpretations for Clustering [0.0]
We propose algorithm-agnostic interpretation methods to explain clustering outcomes in reduced dimensions.
The permutation feature importance for clustering represents a general framework based on shuffling feature values.
All methods can be used with any clustering algorithm able to reassign instances through soft or hard labels.
arXiv Detail & Related papers (2022-09-21T18:08:40Z) - Normalised clustering accuracy: An asymmetric external cluster validity measure [2.900810893770134]
Clustering algorithms are traditionally evaluated using either internal or external validity measures.
In this paper, we argue that the commonly used classical partition similarity scores miss some desirable properties.
We propose and analyse a new measure: a version of the optimal set-matching accuracy.
arXiv Detail & Related papers (2022-09-07T05:08:34Z) - SSDBCODI: Semi-Supervised Density-Based Clustering with Outliers
Detection Integrated [1.8444322599555096]
Clustering analysis is one of the critical tasks in machine learning.
Due to the fact that the performance of clustering clustering can be significantly eroded by outliers, algorithms try to incorporate the process of outlier detection.
We have proposed SSDBCODI, a semi-supervised detection element.
arXiv Detail & Related papers (2022-08-10T21:06:38Z) - Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points.
We provide implementable differentially private clustering algorithms that provide utility when the data is "easy"
We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Predictive K-means with local models [0.028675177318965035]
Predictive clustering seeks to obtain the best of the two worlds.
We present two new algorithms using this technique and show on a variety of data sets that they are competitive for prediction performance.
arXiv Detail & Related papers (2020-12-16T10:49:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.