How to Evaluate Automatic Speech Recognition: Comparing Different Performance and Bias Measures
- URL: http://arxiv.org/abs/2507.05885v1
- Date: Tue, 08 Jul 2025 11:17:13 GMT
- Title: How to Evaluate Automatic Speech Recognition: Comparing Different Performance and Bias Measures
- Authors: Tanvina Patel, Wiebke Hutiri, Aaron Yi Ding, Odette Scharenborg,
- Abstract summary: We compare different performance and bias measures, from literature and proposed, to evaluate end-to-end ASR systems for Dutch.<n>The findings reveal that averaged error rates, a standard in ASR research, alone is not sufficient and should be supplemented by other measures.<n>The paper ends with recommendations for reporting ASR performance and bias to better represent a system's performance for diverse speaker groups, and overall system bias.
- Score: 15.722009470067974
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is increasingly more evidence that automatic speech recognition (ASR) systems are biased against different speakers and speaker groups, e.g., due to gender, age, or accent. Research on bias in ASR has so far primarily focused on detecting and quantifying bias, and developing mitigation approaches. Despite this progress, the open question is how to measure the performance and bias of a system. In this study, we compare different performance and bias measures, from literature and proposed, to evaluate state-of-the-art end-to-end ASR systems for Dutch. Our experiments use several bias mitigation strategies to address bias against different speaker groups. The findings reveal that averaged error rates, a standard in ASR research, alone is not sufficient and should be supplemented by other measures. The paper ends with recommendations for reporting ASR performance and bias to better represent a system's performance for diverse speaker groups, and overall system bias.
Related papers
- CO-VADA: A Confidence-Oriented Voice Augmentation Debiasing Approach for Fair Speech Emotion Recognition [49.27067541740956]
We present CO-VADA, a Confidence-Oriented Voice Augmentation Debiasing Approach that mitigates bias without modifying model architecture or relying on demographic information.<n>CO-VADA identifies training samples that reflect bias patterns present in the training data and then applies voice conversion to alter irrelevant attributes and generate samples.<n>Our framework is compatible with various SER models and voice conversion tools, making it a scalable and practical solution for improving fairness in SER systems.
arXiv Detail & Related papers (2025-06-06T13:25:56Z) - EMO-Debias: Benchmarking Gender Debiasing Techniques in Multi-Label Speech Emotion Recognition [49.27067541740956]
EMO-Debias is a large-scale comparison of 13 debiasing methods applied to multi-label SER.<n>Our study encompasses techniques from pre-processing, regularization, adversarial learning, biased learners, and distributionally robust optimization.<n>Our analysis quantifies the trade-offs between fairness and accuracy, identifying which approaches consistently reduce gender performance gaps.
arXiv Detail & Related papers (2025-06-05T05:48:31Z) - Everyone deserves their voice to be heard: Analyzing Predictive Gender Bias in ASR Models Applied to Dutch Speech Data [13.91630413828167]
This study focuses on identifying the performance disparities of Whisper models on Dutch speech data.
We analyzed the word error rate, character error rate and a BERT-based semantic similarity across gender groups.
arXiv Detail & Related papers (2024-11-14T13:29:09Z) - Measuring and Addressing Indexical Bias in Information Retrieval [69.7897730778898]
PAIR framework supports automatic bias audits for ranked documents or entire IR systems.
After introducing DUO, we run an extensive evaluation of 8 IR systems on a new corpus of 32k synthetic and 4.7k natural documents.
A human behavioral study validates our approach, showing that our bias metric can help predict when and how indexical bias will shift a reader's opinion.
arXiv Detail & Related papers (2024-06-06T17:42:37Z) - The Balancing Act: Unmasking and Alleviating ASR Biases in Portuguese [5.308321515594125]
This study is dedicated to a comprehensive exploration of the Whisper and MMS systems.
Our investigation encompasses various categories, including gender, age, skin tone color, and geo-location.
We empirically show that oversampling techniques alleviate such stereotypical biases.
arXiv Detail & Related papers (2024-02-12T09:35:13Z) - Debiased Automatic Speech Recognition for Dysarthric Speech via Sample
Reweighting with Sample Affinity Test [11.223191305716071]
We present a novel approach, sample reweighting with sample affinity test (Re-SAT)
Re-SAT measures the debiasing helpfulness of the given data sample and then mitigates the bias by debiasing helpfulness-based sample reweighting.
Experimental results demonstrate that Re-SAT contributes to improved ASR performance on dysarthric speech without performance degradation on healthy speech.
arXiv Detail & Related papers (2023-05-22T15:09:27Z) - Evaluating generative audio systems and their metrics [80.97828572629093]
This paper investigates state-of-the-art approaches side-by-side with (i) a set of previously proposed objective metrics for audio reconstruction, and (ii) a listening study.
Results indicate that currently used objective metrics are insufficient to describe the perceptual quality of current systems.
arXiv Detail & Related papers (2022-08-31T21:48:34Z) - The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
We introduce SAME, a novel bias score for semantic bias in embeddings.
We show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.
arXiv Detail & Related papers (2022-03-28T09:28:13Z) - Information-Theoretic Bias Reduction via Causal View of Spurious
Correlation [71.9123886505321]
We propose an information-theoretic bias measurement technique through a causal interpretation of spurious correlation.
We present a novel debiasing framework against the algorithmic bias, which incorporates a bias regularization loss.
The proposed bias measurement and debiasing approaches are validated in diverse realistic scenarios.
arXiv Detail & Related papers (2022-01-10T01:19:31Z) - Model-Based Approach for Measuring the Fairness in ASR [11.076999352942954]
We introduce mixed-effects Poisson regression to better measure and interpret any WER difference among subgroups of interest.
We demonstrate the validity of proposed model-based approach on both synthetic and real-world speech data.
arXiv Detail & Related papers (2021-09-19T05:24:01Z) - Quantifying Bias in Automatic Speech Recognition [28.301997555189462]
This paper quantifies the bias of a Dutch SotA ASR system against gender, age, regional accents and non-native accents.
Based on our findings, we suggest bias mitigation strategies for ASR development.
arXiv Detail & Related papers (2021-03-28T12:52:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.