Significativity Indices for Agreement Values
- URL: http://arxiv.org/abs/2504.15325v1
- Date: Mon, 21 Apr 2025 09:47:53 GMT
- Title: Significativity Indices for Agreement Values
- Authors: Alberto Casagrande, Francesco Fabris, Rossano Girometti, Roberto Pagliarini,
- Abstract summary: Agreement measures, such as Cohen's kappa or intraclass correlation, gauge the matching between two or more classifiers.<n>Some quality scales have been proposed in the literature for Cohen's kappa, but they are mainly naive, and their boundaries are arbitrary.<n>This work proposes a general approach to evaluate the significativity of any agreement value between two classifiers.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Agreement measures, such as Cohen's kappa or intraclass correlation, gauge the matching between two or more classifiers. They are used in a wide range of contexts from medicine, where they evaluate the effectiveness of medical treatments and clinical trials, to artificial intelligence, where they can quantify the approximation due to the reduction of a classifier. The consistency of different classifiers to a golden standard can be compared simply by using the order induced by their agreement measure with respect to the golden standard itself. Nevertheless, labelling an approach as good or bad exclusively by using the value of an agreement measure requires a scale or a significativity index. Some quality scales have been proposed in the literature for Cohen's kappa, but they are mainly naive, and their boundaries are arbitrary. This work proposes a general approach to evaluate the significativity of any agreement value between two classifiers and introduces two significativity indices: one dealing with finite data sets, the other one handling classification probability distributions. Moreover, this manuscript considers the computational issues of evaluating such indices and identifies some efficient algorithms to evaluate them.
Related papers
- Quantization of Large Language Models with an Overdetermined Basis [73.79368761182998]
We introduce an algorithm for data quantization based on the principles of Kashin representation.
Our findings demonstrate that Kashin Quantization achieves competitive or superior quality in model performance.
arXiv Detail & Related papers (2024-04-15T12:38:46Z) - Discordance Minimization-based Imputation Algorithms for Missing Values
in Rating Data [4.100928307172084]
When multiple rating lists are combined or considered together, subjects often have missing ratings.
We propose analyses on missing value patterns using six real-world data sets in various applications.
We propose optimization models and algorithms that minimize the total rating discordance across rating providers.
arXiv Detail & Related papers (2023-11-07T14:42:06Z) - Enriching Disentanglement: From Logical Definitions to Quantitative Metrics [59.12308034729482]
Disentangling the explanatory factors in complex data is a promising approach for data-efficient representation learning.
We establish relationships between logical definitions and quantitative metrics to derive theoretically grounded disentanglement metrics.
We empirically demonstrate the effectiveness of the proposed metrics by isolating different aspects of disentangled representations.
arXiv Detail & Related papers (2023-05-19T08:22:23Z) - Does the evaluation stand up to evaluation? A first-principle approach
to the evaluation of classifiers [0.0]
It is shown that popular metrics such as precision, balanced accuracy, Matthews Correlation Coefficient, Fowlkes-Mallows index, F1-measure, and Area Under the Curve are never optimal.
This fraction is even larger than would be caused by the use of a decision-theoretic metric with moderately wrong coefficients.
arXiv Detail & Related papers (2023-02-21T09:55:19Z) - Energy-Based Learning for Cooperative Games, with Applications to
Feature/Data/Model Valuations [91.36803653600667]
We present a novel energy-based treatment for cooperative games, with a theoretical justification by the maximum entropy framework.
Surprisingly, by conducting variational inference of the energy-based model, we recover various game-theoretic valuation criteria, such as Shapley value and Banzhaf index.
We experimentally demonstrate that the proposed Variational Index enjoys intriguing properties on certain synthetic and real-world valuation problems.
arXiv Detail & Related papers (2021-06-05T17:39:04Z) - A Statistical Analysis of Summarization Evaluation Metrics using
Resampling Methods [60.04142561088524]
We find that the confidence intervals are rather wide, demonstrating high uncertainty in how reliable automatic metrics truly are.
Although many metrics fail to show statistical improvements over ROUGE, two recent works, QAEval and BERTScore, do in some evaluation settings.
arXiv Detail & Related papers (2021-03-31T18:28:14Z) - Classification with Rejection Based on Cost-sensitive Classification [83.50402803131412]
We propose a novel method of classification with rejection by ensemble of learning.
Experimental results demonstrate the usefulness of our proposed approach in clean, noisy, and positive-unlabeled classification.
arXiv Detail & Related papers (2020-10-22T14:05:05Z) - Identifying Spurious Correlations for Robust Text Classification [9.457737910527829]
We propose a method to distinguish spurious and genuine correlations in text classification.
We use features derived from treatment effect estimators to distinguish spurious correlations from "genuine" ones.
Experiments on four datasets suggest that using this approach to inform feature selection also leads to more robust classification.
arXiv Detail & Related papers (2020-10-06T03:49:22Z) - Predictive Value Generalization Bounds [27.434419027831044]
We study a bi-criterion framework for assessing scoring functions in the context of binary classification.
We study properties of scoring functions with respect to predictive values by deriving new distribution-free large deviation and uniform convergence bounds.
arXiv Detail & Related papers (2020-07-09T21:23:28Z) - Classifier uncertainty: evidence, potential impact, and probabilistic
treatment [0.0]
We present an approach to quantify the uncertainty of classification performance metrics based on a probability model of the confusion matrix.
We show that uncertainties can be surprisingly large and limit performance evaluation.
arXiv Detail & Related papers (2020-06-19T12:49:19Z) - Pairwise Supervision Can Provably Elicit a Decision Boundary [84.58020117487898]
Similarity learning is a problem to elicit useful representations by predicting the relationship between a pair of patterns.
We show that similarity learning is capable of solving binary classification by directly eliciting a decision boundary.
arXiv Detail & Related papers (2020-06-11T05:35:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.