Considerations on the Evaluation of Biometric Quality Assessment
Algorithms
- URL: http://arxiv.org/abs/2303.13294v4
- Date: Tue, 31 Oct 2023 14:22:39 GMT
- Title: Considerations on the Evaluation of Biometric Quality Assessment
Algorithms
- Authors: Torsten Schlett, Christian Rathgeb, Juan Tapia, Christoph Busch
- Abstract summary: Quality assessment algorithms can be used to estimate the utility of a biometric sample for the purpose of biometric recognition.
"Error versus Characteristic" (EDC) plots, and "partial Area Under Curve" (pAUC) values of curves therein, are generally used by researchers to evaluate such quality assessment algorithms.
- Score: 7.092869001331781
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quality assessment algorithms can be used to estimate the utility of a
biometric sample for the purpose of biometric recognition. "Error versus
Discard Characteristic" (EDC) plots, and "partial Area Under Curve" (pAUC)
values of curves therein, are generally used by researchers to evaluate the
predictive performance of such quality assessment algorithms. An EDC curve
depends on an error type such as the "False Non Match Rate" (FNMR), a quality
assessment algorithm, a biometric recognition system, a set of comparisons each
corresponding to a biometric sample pair, and a comparison score threshold
corresponding to a starting error. To compute an EDC curve, comparisons are
progressively discarded based on the associated samples' lowest quality scores,
and the error is computed for the remaining comparisons. Additionally, a
discard fraction limit or range must be selected to compute pAUC values, which
can then be used to quantitatively rank quality assessment algorithms.
This paper discusses and analyses various details for this kind of quality
assessment algorithm evaluation, including general EDC properties,
interpretability improvements for pAUC values based on a hard lower error limit
and a soft upper error limit, the use of relative instead of discrete rankings,
stepwise vs. linear curve interpolation, and normalisation of quality scores to
a [0, 100] integer range. We also analyse the stability of quantitative quality
assessment algorithm rankings based on pAUC values across varying pAUC discard
fraction limits and starting errors, concluding that higher pAUC discard
fraction limits should be preferred. The analyses are conducted both with
synthetic data and with real face image and fingerprint data, with a focus on
general modality-independent conclusions for EDC evaluations. Various EDC
alternatives are discussed as well.
Related papers
- Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge [51.93909886542317]
We show how a single aggregate correlation score can obscure differences between human behavior and automatic evaluation methods.
We propose stratifying results by human label uncertainty to provide a more robust analysis of automatic evaluation performance.
arXiv Detail & Related papers (2024-10-03T03:08:29Z) - Quality assurance of organs-at-risk delineation in radiotherapy [7.698565355235687]
The delineation of tumor target and organs-at-risk is critical in the radiotherapy treatment planning.
The quality assurance of the automatic segmentation is still an unmet need in clinical practice.
Our proposed model, which introduces residual network and attention mechanism in the one-class classification framework, was able to detect the various types of OAR contour errors with high accuracy.
arXiv Detail & Related papers (2024-05-20T02:32:46Z) - Improving Interpretability of Scores in Anomaly Detection Based on Gaussian-Bernoulli Restricted Boltzmann Machine [0.0]
In GBRBM-based anomaly detection, normal and anomalous data are classified based on a score that is identical to an energy function of the marginal GBRBM.
We propose a measure that improves score's interpretability based on its cumulative distribution.
We also establish a guideline for setting the threshold using the interpretable measure.
arXiv Detail & Related papers (2024-03-19T12:13:52Z) - Discordance Minimization-based Imputation Algorithms for Missing Values
in Rating Data [4.100928307172084]
When multiple rating lists are combined or considered together, subjects often have missing ratings.
We propose analyses on missing value patterns using six real-world data sets in various applications.
We propose optimization models and algorithms that minimize the total rating discordance across rating providers.
arXiv Detail & Related papers (2023-11-07T14:42:06Z) - C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue
Evaluation [68.59356746305255]
We propose a novel model-agnostic approach to measure the turn-level interaction between the system and the user.
Our approach significantly improves the correlation with human judgment compared with existing evaluation systems.
arXiv Detail & Related papers (2023-06-27T06:58:03Z) - Deep Bayesian ICP Covariance Estimation [3.5136071950790737]
Iterative Closest Point (ICP) point cloud registration algorithm is essential for state estimation and sensor fusion purposes.
We argue that a major source of error for ICP is in the input data itself, from the sensor noise to the scene geometry.
Benefiting from recent developments in deep learning for point clouds, we propose a data-driven approach to learn an error model for ICP.
arXiv Detail & Related papers (2022-02-23T16:42:04Z) - Reinforcement Learning with Heterogeneous Data: Estimation and Inference [84.72174994749305]
We introduce the K-Heterogeneous Markov Decision Process (K-Hetero MDP) to address sequential decision problems with population heterogeneity.
We propose the Auto-Clustered Policy Evaluation (ACPE) for estimating the value of a given policy, and the Auto-Clustered Policy Iteration (ACPI) for estimating the optimal policy in a given policy class.
We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset.
arXiv Detail & Related papers (2022-01-31T20:58:47Z) - Doing Great at Estimating CATE? On the Neglected Assumptions in
Benchmark Comparisons of Treatment Effect Estimators [91.3755431537592]
We show that even in arguably the simplest setting, estimation under ignorability assumptions can be misleading.
We consider two popular machine learning benchmark datasets for evaluation of heterogeneous treatment effect estimators.
We highlight that the inherent characteristics of the benchmark datasets favor some algorithms over others.
arXiv Detail & Related papers (2021-07-28T13:21:27Z) - Performance Evaluation of Adversarial Attacks: Discrepancies and
Solutions [51.8695223602729]
adversarial attack methods have been developed to challenge the robustness of machine learning models.
We propose a Piece-wise Sampling Curving (PSC) toolkit to effectively address the discrepancy.
PSC toolkit offers options for balancing the computational cost and evaluation effectiveness.
arXiv Detail & Related papers (2021-04-22T14:36:51Z) - A Statistical Analysis of Summarization Evaluation Metrics using
Resampling Methods [60.04142561088524]
We find that the confidence intervals are rather wide, demonstrating high uncertainty in how reliable automatic metrics truly are.
Although many metrics fail to show statistical improvements over ROUGE, two recent works, QAEval and BERTScore, do in some evaluation settings.
arXiv Detail & Related papers (2021-03-31T18:28:14Z) - Strategy for Boosting Pair Comparison and Improving Quality Assessment
Accuracy [29.849156371902943]
Pair Comparison (PC) is of significant advantage over Absolute Category Rating (ACR) in terms of discriminability.
In this study, we employ a generic model to bridge the pair comparison data and ACR data, where the variance term could be recovered and the obtained information is more complete.
In such a way, the proposed methodology could achieve the same accuracy of pair comparison but with the compelxity as low as ACR.
arXiv Detail & Related papers (2020-10-01T13:05:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.