Characterizing and comparing external measures for the assessment of
cluster analysis and community detection
- URL: http://arxiv.org/abs/2102.00708v1
- Date: Mon, 1 Feb 2021 09:10:25 GMT
- Title: Characterizing and comparing external measures for the assessment of
cluster analysis and community detection
- Authors: Nejat Arinik (LIA), Vincent Labatut, Rosa Figueiredo
- Abstract summary: Many external evaluation measures have been proposed in the literature to compare two partitions of the same set.
This makes the task of selecting the most appropriate measure for a given situation a challenge for the end user.
We propose a new empirical evaluation framework to solve this issue, and help the end user selecting an appropriate measure for their application.
- Score: 1.5543116359698947
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the context of cluster analysis and graph partitioning, many external
evaluation measures have been proposed in the literature to compare two
partitions of the same set. This makes the task of selecting the most
appropriate measure for a given situation a challenge for the end user.
However, this issue is overlooked in the literature. Researchers tend to follow
tradition and use the standard measures of their field, although they often
became standard only because previous researchers started consistently using
them. In this work, we propose a new empirical evaluation framework to solve
this issue, and help the end user selecting an appropriate measure for their
application. For a collection of candidate measures, it first consists in
describing their behavior by computing them for a generated dataset of
partitions, obtained by applying a set of predefined parametric partition
transformations. Second, our framework performs a regression analysis to
characterize the measures in terms of how they are affected by these parameters
and transformations. This allows both describing and comparing the measures.
Our approach is not tied to any specific measure or application, so it can be
applied to any situation. We illustrate its relevance by applying it to a
selection of standard measures, and show how it can be put in practice through
two concrete use cases.
Related papers
- Quantifying User Coherence: A Unified Framework for Cross-Domain Recommendation Analysis [69.37718774071793]
This paper introduces novel information-theoretic measures for understanding recommender systems.
We evaluate 7 recommendation algorithms across 9 datasets, revealing the relationships between our measures and standard performance metrics.
arXiv Detail & Related papers (2024-10-03T13:02:07Z) - CRITERIA: a New Benchmarking Paradigm for Evaluating Trajectory Prediction Models for Autonomous Driving [6.868387710209245]
We propose a new benChmarking paRadIgm for evaluaTing trajEctoRy predIction Approaches (CRITERIA)
We show that the proposed benchmark can produce a more accurate ranking of the models and serve as a means of characterizing their behavior.
arXiv Detail & Related papers (2023-10-11T18:28:15Z) - Choosing a Proxy Metric from Past Experiments [54.338884612982405]
In many randomized experiments, the treatment effect of the long-term metric is often difficult or infeasible to measure.
A common alternative is to measure several short-term proxy metrics in the hope they closely track the long-term metric.
We introduce a new statistical framework to both define and construct an optimal proxy metric for use in a homogeneous population of randomized experiments.
arXiv Detail & Related papers (2023-09-14T17:43:02Z) - Challenges to Evaluating the Generalization of Coreference Resolution Models: A Measurement Modeling Perspective [69.50044040291847]
We show how multi-dataset evaluations risk conflating different factors concerning what, precisely, is being measured.
This makes it difficult to draw more generalizable conclusions from these evaluations.
arXiv Detail & Related papers (2023-03-16T05:32:02Z) - Navigating the Metric Maze: A Taxonomy of Evaluation Metrics for Anomaly
Detection in Time Series [0.456877715768796]
This paper provides a comprehensive overview of the metrics used for the evaluation of time series anomaly detection methods.
Twenty metrics are analyzed and discussed in detail, highlighting the unique suitability of each for specific tasks.
arXiv Detail & Related papers (2023-03-02T13:58:06Z) - In Search of Insights, Not Magic Bullets: Towards Demystification of the
Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria.
We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z) - Classification Performance Metric Elicitation and its Applications [5.5637552942511155]
Despite its practical interest, there is limited formal guidance on how to select metrics for machine learning applications.
This thesis outlines metric elicitation as a principled framework for selecting the performance metric that best reflects implicit user preferences.
arXiv Detail & Related papers (2022-08-19T03:57:17Z) - On the Choice of Fairness: Finding Representative Fairness Metrics for a
Given Context [5.667221573173013]
Various notions of fairness have been defined, though choosing an appropriate metric is cumbersome.
Trade-offs and impossibility theorems make such selection even more complicated and controversial.
We propose a framework that automatically discovers the correlations and trade-offs between different pairs of measures for a given context.
arXiv Detail & Related papers (2021-09-13T04:17:38Z) - A Statistical Analysis of Summarization Evaluation Metrics using
Resampling Methods [60.04142561088524]
We find that the confidence intervals are rather wide, demonstrating high uncertainty in how reliable automatic metrics truly are.
Although many metrics fail to show statistical improvements over ROUGE, two recent works, QAEval and BERTScore, do in some evaluation settings.
arXiv Detail & Related papers (2021-03-31T18:28:14Z) - Evaluation Metrics for Conditional Image Generation [100.69766435176557]
We present two new metrics for evaluating generative models in the class-conditional image generation setting.
A theoretical analysis shows the motivation behind each proposed metric and links the novel metrics to their unconditional counterparts.
We provide an extensive empirical evaluation, comparing the metrics to their unconditional variants and to other metrics, and utilize them to analyze existing generative models.
arXiv Detail & Related papers (2020-04-26T12:15:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.