Related papers: Advanced Acceptance Score: A Holistic Measure for Biometric Quantification

Advanced Acceptance Score: A Holistic Measure for Biometric Quantification

URL: http://arxiv.org/abs/2602.15535v1
Date: Tue, 17 Feb 2026 12:33:45 GMT
Title: Advanced Acceptance Score: A Holistic Measure for Biometric Quantification
Authors: Aman Verma, Seshan Srirangarajan, Sumantra Dutta Roy,
Abstract summary: Quantifying biometric characteristics within hand gestures involve derivation of fitness scores from a gesture and identity aware feature space.<n>Existing biometric capacity estimation literature relies upon error rates.<n>We present an exhaustive set of evaluation measures.
Score: 4.409605045494181
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Quantifying biometric characteristics within hand gestures involve derivation of fitness scores from a gesture and identity aware feature space. However, evaluating the quality of these scores remains an open question. Existing biometric capacity estimation literature relies upon error rates. But these rates do not indicate goodness of scores. Thus, in this manuscript we present an exhaustive set of evaluation measures. We firstly identify ranking order and relevance of output scores as the primary basis for evaluation. In particular, we consider both rank deviation as well as rewards for: (i) higher scores of high ranked gestures and (ii) lower scores of low ranked gestures. We also compensate for correspondence between trends of output and ground truth scores. Finally, we account for disentanglement between identity features of gestures as a discounting factor. Integrating these elements with adequate weighting, we formulate advanced acceptance score as a holistic evaluation measure. To assess effectivity of the proposed we perform in-depth experimentation over three datasets with five state-of-the-art (SOTA) models. Results show that the optimal score selected with our measure is more appropriate than existing other measures. Also, our proposed measure depicts correlation with existing measures. This further validates its reliability. We have made our \href{https://github.com/AmanVerma2307/MeasureSuite}{code} public.

Related papers

Evaluating the Evaluators: Metrics for Compositional Text-to-Image Generation [13.460909458745379]
We present a broad study of widely used metrics for compositional text-image evaluation.<n>Our analysis goes beyond simple correlation, examining their behavior across diverse compositional challenges.<n>Results show that no single metric performs consistently across tasks.
arXiv Detail & Related papers (2025-09-25T14:31:09Z)
Where is this coming from? Making groundedness count in the evaluation of Document VQA models [12.951716701565019]
We argue that common evaluation metrics do not account for the semantic and multimodal groundedness of a model's outputs.<n>We propose a new evaluation methodology that accounts for the groundedness of predictions.<n>Our proposed methodology is parameterized in such a way that users can configure the score according to their preferences.
arXiv Detail & Related papers (2025-03-24T20:14:46Z)
PerSEval: Assessing Personalization in Text Summarizers [14.231110627461]
We argue that accuracy measures are inadequate for evaluating the degree of personalization of personalized text summaries. We propose PerSEval, a novel measure that satisfies the required sufficiency condition.
arXiv Detail & Related papers (2024-06-29T14:37:36Z)
Cobra Effect in Reference-Free Image Captioning Metrics [58.438648377314436]
A proliferation of reference-free methods, leveraging visual-language pre-trained models (VLMs), has emerged. In this paper, we study if there are any deficiencies in reference-free metrics. We employ GPT-4V as an evaluative tool to assess generated sentences and the result reveals that our approach achieves state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2024-02-18T12:36:23Z)
Goodhart's Law Applies to NLP's Explanation Benchmarks [57.26445915212884]
We critically examine two sets of metrics: the ERASER metrics (comprehensiveness and sufficiency) and the EVAL-X metrics. We show that we can inflate a model's comprehensiveness and sufficiency scores dramatically without altering its predictions or explanations on in-distribution test inputs. Our results raise doubts about the ability of current metrics to guide explainability research, underscoring the need for a broader reassessment of what precisely these metrics are intended to capture.
arXiv Detail & Related papers (2023-08-28T03:03:03Z)
Integrating Rankings into Quantized Scores in Peer Review [61.27794774537103]
In peer review, reviewers are usually asked to provide scores for the papers. To mitigate this issue, conferences have started to ask reviewers to additionally provide a ranking of the papers they have reviewed. There are no standard procedure for using this ranking information and Area Chairs may use it in different ways. We take a principled approach to integrate the ranking information into the scores.
arXiv Detail & Related papers (2022-04-05T19:39:13Z)
A Statistical Analysis of Summarization Evaluation Metrics using Resampling Methods [60.04142561088524]
We find that the confidence intervals are rather wide, demonstrating high uncertainty in how reliable automatic metrics truly are. Although many metrics fail to show statistical improvements over ROUGE, two recent works, QAEval and BERTScore, do in some evaluation settings.
arXiv Detail & Related papers (2021-03-31T18:28:14Z)
Perception Score, A Learned Metric for Open-ended Text Generation Evaluation [62.7690450616204]
We propose a novel and powerful learning-based evaluation metric: Perception Score. The method measures the overall quality of the generation and scores holistically instead of only focusing on one evaluation criteria, such as word overlapping.
arXiv Detail & Related papers (2020-08-07T10:48:40Z)
Uncertainty-aware Score Distribution Learning for Action Quality Assessment [91.05846506274881]
We propose an uncertainty-aware score distribution learning (USDL) approach for action quality assessment (AQA) Specifically, we regard an action as an instance associated with a score distribution, which describes the probability of different evaluated scores. Under the circumstance where fine-grained score labels are available, we devise a multi-path uncertainty-aware score distributions learning (MUSDL) method to explore the disentangled components of a score.
arXiv Detail & Related papers (2020-06-13T15:41:29Z)
On the Ambiguity of Rank-Based Evaluation of Entity Alignment or Link Prediction Methods [27.27230441498167]
We take a closer look at the evaluation of two families of methods for enriching information from knowledge graphs: Link Prediction and Entity Alignment. In particular, we demonstrate that all existing scores can hardly be used to compare results across different datasets. We show that this leads to various problems in the interpretation of results, which may support misleading conclusions.
arXiv Detail & Related papers (2020-02-17T12:26:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.