Evaluation: from precision, recall and F-measure to ROC, informedness,
markedness and correlation
- URL: http://arxiv.org/abs/2010.16061v1
- Date: Sun, 11 Oct 2020 02:15:11 GMT
- Title: Evaluation: from precision, recall and F-measure to ROC, informedness,
markedness and correlation
- Authors: David M. W. Powers
- Abstract summary: Measures such as Recall, Precision, F-Measure and Rand Accuracy are biased and should not be used without clear understanding of the biases.
We discuss several concepts and measures that reflect the probability that prediction is informed versus chance.
- Score: 3.7819322027528113
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Commonly used evaluation measures including Recall, Precision, F-Measure and
Rand Accuracy are biased and should not be used without clear understanding of
the biases, and corresponding identification of chance or base case levels of
the statistic. Using these measures a system that performs worse in the
objective sense of Informedness, can appear to perform better under any of
these commonly used measures. We discuss several concepts and measures that
reflect the probability that prediction is informed versus chance. Informedness
and introduce Markedness as a dual measure for the probability that prediction
is marked versus chance. Finally we demonstrate elegant connections between the
concepts of Informedness, Markedness, Correlation and Significance as well as
their intuitive relationships with Recall and Precision, and outline the
extension from the dichotomous case to the general multi-class case.
Related papers
- On Information-Theoretic Measures of Predictive Uncertainty [5.8034373350518775]
Despite its significance, a consensus on the correct measurement of predictive uncertainty remains elusive.
Our proposed framework categorizes predictive uncertainty measures according to two factors: (I) The predicting model (II) The approximation of the true predictive distribution.
We empirically evaluate these measures in typical uncertainty estimation settings, such as misclassification detection, selective prediction, and out-of-distribution detection.
arXiv Detail & Related papers (2024-10-14T17:52:18Z) - Beyond Calibration: Assessing the Probabilistic Fit of Neural Regressors via Conditional Congruence [2.2359781747539396]
Deep networks often suffer from overconfidence and misaligned predictive distributions.
We introduce a metric, Conditional Congruence Error (CCE), that uses conditional kernel mean embeddings to estimate the distance between the learned predictive distribution and the empirical, conditional distribution in a dataset.
We show that using to measure congruence 1) accurately quantifies misalignment between distributions when the data generating process is known, 2) effectively scales to real-world, high dimensional image regression tasks, and 3) can be used to gauge model reliability on unseen instances.
arXiv Detail & Related papers (2024-05-20T23:30:07Z) - Revisiting Confidence Estimation: Towards Reliable Failure Prediction [53.79160907725975]
We find a general, widely existing but actually-neglected phenomenon that most confidence estimation methods are harmful for detecting misclassification errors.
We propose to enlarge the confidence gap by finding flat minima, which yields state-of-the-art failure prediction performance.
arXiv Detail & Related papers (2024-03-05T11:44:14Z) - Standardized Interpretable Fairness Measures for Continuous Risk Scores [4.192037827105842]
We propose a standardized version of fairness measures for continuous scores with a reasonable interpretation based on the Wasserstein distance.
Our measures are easily computable and well suited for quantifying and interpreting the strength of group disparities as well as for comparing biases across different models, datasets, or time points.
arXiv Detail & Related papers (2023-08-22T12:01:49Z) - Model-free generalized fiducial inference [0.0]
I propose and develop ideas for a model-free statistical framework for imprecise probabilistic prediction inference.
This framework facilitates uncertainty quantification in the form of prediction sets that offer finite sample control of type 1 errors.
I consider the theoretical and empirical properties of a precise probabilistic approximation to the model-free imprecise framework.
arXiv Detail & Related papers (2023-07-24T01:58:48Z) - Evaluating Probabilistic Classifiers: The Triptych [62.997667081978825]
We propose and study a triptych of diagnostic graphics that focus on distinct and complementary aspects of forecast performance.
The reliability diagram addresses calibration, the receiver operating characteristic (ROC) curve diagnoses discrimination ability, and the Murphy diagram visualizes overall predictive performance and value.
arXiv Detail & Related papers (2023-01-25T19:35:23Z) - Measuring Fairness of Text Classifiers via Prediction Sensitivity [63.56554964580627]
ACCUMULATED PREDICTION SENSITIVITY measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features.
We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness.
arXiv Detail & Related papers (2022-03-16T15:00:33Z) - Uncertainty Estimation for Heatmap-based Landmark Localization [4.673063715963989]
We propose Quantile Binning, a data-driven method to categorise predictions by uncertainty with estimated error bounds.
We demonstrate this framework by comparing and contrasting three uncertainty measures.
We conclude by illustrating how filtering out gross mispredictions caught in our Quantile Bins significantly improves the proportion of predictions under an acceptable error threshold.
arXiv Detail & Related papers (2022-03-04T14:40:44Z) - DEUP: Direct Epistemic Uncertainty Prediction [56.087230230128185]
Epistemic uncertainty is part of out-of-sample prediction error due to the lack of knowledge of the learner.
We propose a principled approach for directly estimating epistemic uncertainty by learning to predict generalization error and subtracting an estimate of aleatoric uncertainty.
arXiv Detail & Related papers (2021-02-16T23:50:35Z) - Learning Accurate Dense Correspondences and When to Trust Them [161.76275845530964]
We aim to estimate a dense flow field relating two images, coupled with a robust pixel-wise confidence map.
We develop a flexible probabilistic approach that jointly learns the flow prediction and its uncertainty.
Our approach obtains state-of-the-art results on challenging geometric matching and optical flow datasets.
arXiv Detail & Related papers (2021-01-05T18:54:11Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.