Related papers: ASR-FAIRBENCH: Measuring and Benchmarking Equity Across Speech Recognition Systems

ASR-FAIRBENCH: Measuring and Benchmarking Equity Across Speech Recognition Systems

URL: http://arxiv.org/abs/2505.11572v1
Date: Fri, 16 May 2025 11:31:31 GMT
Title: ASR-FAIRBENCH: Measuring and Benchmarking Equity Across Speech Recognition Systems
Authors: Anand Rai, Satyam Rahangdale, Utkarsh Anand, Animesh Mukherjee,
Abstract summary: We introduce the ASR-FAIRBENCH leaderboard which is designed to assess both the accuracy and equity of ASR models in real-time.<n>Our approach reveals significant performance disparities in SOTA ASR models across demographic groups and offers a benchmark to drive the development of more inclusive ASR technologies.
Score: 3.8947802481286478
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automatic Speech Recognition (ASR) systems have become ubiquitous in everyday applications, yet significant disparities in performance across diverse demographic groups persist. In this work, we introduce the ASR-FAIRBENCH leaderboard which is designed to assess both the accuracy and equity of ASR models in real-time. Leveraging the Meta's Fair-Speech dataset, which captures diverse demographic characteristics, we employ a mixed-effects Poisson regression model to derive an overall fairness score. This score is integrated with traditional metrics like Word Error Rate (WER) to compute the Fairness Adjusted ASR Score (FAAS), providing a comprehensive evaluation framework. Our approach reveals significant performance disparities in SOTA ASR models across demographic groups and offers a benchmark to drive the development of more inclusive ASR technologies.

Related papers

RAG-Zeval: Towards Robust and Interpretable Evaluation on RAG Responses through End-to-End Rule-Guided Reasoning [64.46921169261852]
RAG-Zeval is a novel end-to-end framework that formulates faithfulness and correctness evaluation as a rule-guided reasoning task.<n>Our approach trains evaluators with reinforcement learning, facilitating compact models to generate comprehensive and sound assessments.<n>Experiments demonstrate RAG-Zeval's superior performance, achieving the strongest correlation with human judgments.
arXiv Detail & Related papers (2025-05-28T14:55:33Z)
PSRB: A Comprehensive Benchmark for Evaluating Persian ASR Systems [0.0]
This paper introduces Persian Speech Recognition Benchmark(PSRB), a comprehensive benchmark designed to address this gap by incorporating diverse linguistic and acoustic conditions.<n>We evaluate ten ASR systems, including state-of-the-art commercial and open-source models, to examine performance variations and inherent biases.<n>Our findings indicate that while ASR models generally perform well on standard Persian, they struggle with regional accents, children's speech, and specific linguistic challenges.
arXiv Detail & Related papers (2025-05-27T14:14:55Z)
Transferable Adversarial Attacks against ASR [43.766547483367795]
We study the vulnerability of practical black-box attacks in cutting-edge automatic speech recognition models. We propose a speech-aware gradient optimization approach (SAGO) for ASR, which forces mistranscription with minimal impact on human imperceptibility. Our comprehensive experimental results reveal performance enhancements compared to baseline approaches across five models on two databases.
arXiv Detail & Related papers (2024-11-14T06:32:31Z)
FairLENS: Assessing Fairness in Law Enforcement Speech Recognition [37.75768315119143]
We propose a novel and adaptable evaluation method to examine the fairness disparity between different models. We conducted fairness assessments on 1 open-source and 11 commercially available state-of-the-art ASR models.
arXiv Detail & Related papers (2024-05-21T19:23:40Z)
Improving Fairness and Robustness in End-to-End Speech Recognition through unsupervised clustering [49.069298478971696]
We present a privacy preserving approach to improve fairness and robustness of end-to-end ASR. We extract utterance level embeddings using a speaker ID model trained on a public dataset. We use cluster IDs instead of speaker utterance embeddings as extra features during model training.
arXiv Detail & Related papers (2023-06-06T21:13:08Z)
GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models [60.48306899271866]
We present a new framework, called GREAT Score, for global robustness evaluation of adversarial perturbation using generative models. We show high correlation and significantly reduced cost of GREAT Score when compared to the attack-based model ranking on RobustBench. GREAT Score can be used for remote auditing of privacy-sensitive black-box models.
arXiv Detail & Related papers (2023-04-19T14:58:27Z)
Cross-utterance ASR Rescoring with Graph-based Label Propagation [14.669201156515891]
We propose a novel approach for ASR N-best hypothesis rescoring with graph-based label propagation. In contrast to conventional neural language model (LM) based ASR rescoring/reranking models, our approach focuses on acoustic information.
arXiv Detail & Related papers (2023-03-27T12:08:05Z)
End-to-End Speech Recognition: A Survey [68.35707678386949]
The goal of this survey is to provide a taxonomy of E2E ASR models and corresponding improvements. All relevant aspects of E2E ASR are covered in this work, accompanied by discussions of performance and deployment opportunities.
arXiv Detail & Related papers (2023-03-03T01:46:41Z)
Open-Set Recognition: A Good Closed-Set Classifier is All You Need [146.6814176602689]
We show that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes. We use this correlation to boost the performance of the cross-entropy OSR 'baseline' by improving its closed-set accuracy. We also construct new benchmarks which better respect the task of detecting semantic novelty.
arXiv Detail & Related papers (2021-10-12T17:58:59Z)
Fine-tuning of Pre-trained End-to-end Speech Recognition with Generative Adversarial Networks [10.723935272906461]
Adversarial training of end-to-end (E2E) ASR systems using generative adversarial networks (GAN) has recently been explored. We introduce a novel framework for fine-tuning a pre-trained ASR model using the GAN objective. Our proposed approach outperforms baselines and conventional GAN-based adversarial models.
arXiv Detail & Related papers (2021-03-10T17:40:48Z)
Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU) We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.