Evaluation: from precision, recall and F-measure to ROC, informedness,
markedness and correlation
- URL: http://arxiv.org/abs/2010.16061v1
- Date: Sun, 11 Oct 2020 02:15:11 GMT
- Title: Evaluation: from precision, recall and F-measure to ROC, informedness,
markedness and correlation
- Authors: David M. W. Powers
- Abstract summary: Measures such as Recall, Precision, F-Measure and Rand Accuracy are biased and should not be used without clear understanding of the biases.
We discuss several concepts and measures that reflect the probability that prediction is informed versus chance.
- Score: 3.7819322027528113
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Commonly used evaluation measures including Recall, Precision, F-Measure and
Rand Accuracy are biased and should not be used without clear understanding of
the biases, and corresponding identification of chance or base case levels of
the statistic. Using these measures a system that performs worse in the
objective sense of Informedness, can appear to perform better under any of
these commonly used measures. We discuss several concepts and measures that
reflect the probability that prediction is informed versus chance. Informedness
and introduce Markedness as a dual measure for the probability that prediction
is marked versus chance. Finally we demonstrate elegant connections between the
concepts of Informedness, Markedness, Correlation and Significance as well as
their intuitive relationships with Recall and Precision, and outline the
extension from the dichotomous case to the general multi-class case.
Related papers
- Confidence Intervals for Evaluation of Data Mining [3.8485822412233452]
We consider statistical inference about general performance measures used in data mining.
We study the finite sample coverage probabilities for confidence intervals.
We also propose a blurring correction' on the variance to improve the finite sample performance.
arXiv Detail & Related papers (2025-02-10T20:22:02Z) - Uncertainty Quantification in Stereo Matching [61.73532883992135]
We propose a new framework for stereo matching and its uncertainty quantification.
We adopt Bayes risk as a measure of uncertainty and estimate data and model uncertainty separately.
We apply our uncertainty method to improve prediction accuracy by selecting data points with small uncertainties.
arXiv Detail & Related papers (2024-12-24T23:28:20Z) - On Information-Theoretic Measures of Predictive Uncertainty [5.8034373350518775]
Despite its significance, a consensus on the correct measurement of predictive uncertainty remains elusive.
Our proposed framework categorizes predictive uncertainty measures according to two factors: (I) The predicting model (II) The approximation of the true predictive distribution.
We empirically evaluate these measures in typical uncertainty estimation settings, such as misclassification detection, selective prediction, and out-of-distribution detection.
arXiv Detail & Related papers (2024-10-14T17:52:18Z) - Beyond Calibration: Assessing the Probabilistic Fit of Neural Regressors via Conditional Congruence [2.2359781747539396]
Deep networks often suffer from overconfidence and misaligned predictive distributions.
We introduce a metric, Conditional Congruence Error (CCE), that uses conditional kernel mean embeddings to estimate the distance between the learned predictive distribution and the empirical, conditional distribution in a dataset.
We show that using to measure congruence 1) accurately quantifies misalignment between distributions when the data generating process is known, 2) effectively scales to real-world, high dimensional image regression tasks, and 3) can be used to gauge model reliability on unseen instances.
arXiv Detail & Related papers (2024-05-20T23:30:07Z) - Revisiting Confidence Estimation: Towards Reliable Failure Prediction [53.79160907725975]
We find a general, widely existing but actually-neglected phenomenon that most confidence estimation methods are harmful for detecting misclassification errors.
We propose to enlarge the confidence gap by finding flat minima, which yields state-of-the-art failure prediction performance.
arXiv Detail & Related papers (2024-03-05T11:44:14Z) - Standardized Interpretable Fairness Measures for Continuous Risk Scores [4.192037827105842]
We propose a standardized version of fairness measures for continuous scores with a reasonable interpretation based on the Wasserstein distance.
Our measures are easily computable and well suited for quantifying and interpreting the strength of group disparities as well as for comparing biases across different models, datasets, or time points.
arXiv Detail & Related papers (2023-08-22T12:01:49Z) - Evaluating Probabilistic Classifiers: The Triptych [62.997667081978825]
We propose and study a triptych of diagnostic graphics that focus on distinct and complementary aspects of forecast performance.
The reliability diagram addresses calibration, the receiver operating characteristic (ROC) curve diagnoses discrimination ability, and the Murphy diagram visualizes overall predictive performance and value.
arXiv Detail & Related papers (2023-01-25T19:35:23Z) - Measuring Fairness of Text Classifiers via Prediction Sensitivity [63.56554964580627]
ACCUMULATED PREDICTION SENSITIVITY measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features.
We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness.
arXiv Detail & Related papers (2022-03-16T15:00:33Z) - Uncertainty Estimation for Heatmap-based Landmark Localization [4.673063715963989]
We propose Quantile Binning, a data-driven method to categorise predictions by uncertainty with estimated error bounds.
We demonstrate this framework by comparing and contrasting three uncertainty measures.
We conclude by illustrating how filtering out gross mispredictions caught in our Quantile Bins significantly improves the proportion of predictions under an acceptable error threshold.
arXiv Detail & Related papers (2022-03-04T14:40:44Z) - DEUP: Direct Epistemic Uncertainty Prediction [56.087230230128185]
Epistemic uncertainty is part of out-of-sample prediction error due to the lack of knowledge of the learner.
We propose a principled approach for directly estimating epistemic uncertainty by learning to predict generalization error and subtracting an estimate of aleatoric uncertainty.
arXiv Detail & Related papers (2021-02-16T23:50:35Z) - Learning Accurate Dense Correspondences and When to Trust Them [161.76275845530964]
We aim to estimate a dense flow field relating two images, coupled with a robust pixel-wise confidence map.
We develop a flexible probabilistic approach that jointly learns the flow prediction and its uncertainty.
Our approach obtains state-of-the-art results on challenging geometric matching and optical flow datasets.
arXiv Detail & Related papers (2021-01-05T18:54:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.