Quantifying Bias in Automatic Speech Recognition
- URL: http://arxiv.org/abs/2103.15122v2
- Date: Thu, 1 Apr 2021 09:12:22 GMT
- Title: Quantifying Bias in Automatic Speech Recognition
- Authors: Siyuan Feng, Olya Kudina, Bence Mark Halpern and Odette Scharenborg
- Abstract summary: This paper quantifies the bias of a Dutch SotA ASR system against gender, age, regional accents and non-native accents.
Based on our findings, we suggest bias mitigation strategies for ASR development.
- Score: 28.301997555189462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic speech recognition (ASR) systems promise to deliver objective
interpretation of human speech. Practice and recent evidence suggests that the
state-of-the-art (SotA) ASRs struggle with the large variation in speech due to
e.g., gender, age, speech impairment, race, and accents. Many factors can cause
the bias of an ASR system. Our overarching goal is to uncover bias in ASR
systems to work towards proactive bias mitigation in ASR. This paper is a first
step towards this goal and systematically quantifies the bias of a Dutch SotA
ASR system against gender, age, regional accents and non-native accents. Word
error rates are compared, and an in-depth phoneme-level error analysis is
conducted to understand where bias is occurring. We primarily focus on bias due
to articulation differences in the dataset. Based on our findings, we suggest
bias mitigation strategies for ASR development.
Related papers
- Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - Reexamining Racial Disparities in Automatic Speech Recognition Performance: The Role of Confounding by Provenance [7.882996636086014]
It is important that automatic speech recognition (ASR) models and their use is fair and equitable.
The current study seeks to understand the factors underlying this disparity by examining the performance of the current state-of-the-art neural network based ASR system.
arXiv Detail & Related papers (2024-07-19T02:14:17Z) - Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition [52.624909026294105]
We propose a non-autoregressive speech error correction method.
A Confidence Module measures the uncertainty of each word of the N-best ASR hypotheses.
The proposed system reduces the error rate by 21% compared with the ASR model.
arXiv Detail & Related papers (2024-06-29T17:56:28Z) - Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
Most languages lack sufficient paired speech and text data to effectively train automatic speech recognition systems.
We propose the removal of reliance on a phoneme lexicon to develop unsupervised ASR systems.
We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.
arXiv Detail & Related papers (2024-06-12T16:30:58Z) - Lost in Transcription: Identifying and Quantifying the Accuracy Biases of Automatic Speech Recognition Systems Against Disfluent Speech [0.0]
Speech recognition systems fail to accurately interpret speech patterns deviating from typical fluency, leading to critical usability issues and misinterpretations.
This study evaluates six leading ASRs, analyzing their performance on both a real-world dataset of speech samples from individuals who stutter and a synthetic dataset derived from the widely-used LibriSpeech benchmark.
Results reveal a consistent and statistically significant accuracy bias across all ASRs against disfluent speech, manifesting in significant syntactical and semantic inaccuracies in transcriptions.
arXiv Detail & Related papers (2024-05-10T00:16:58Z) - Investigating the Sensitivity of Automatic Speech Recognition Systems to
Phonetic Variation in L2 Englishes [3.198144010381572]
This work demonstrates a method of probing an ASR system to discover how it handles phonetic variation across a number of L2 Englishes.
It is demonstrated that the behaviour of the ASR is systematic and consistent across speakers with similar spoken varieties.
arXiv Detail & Related papers (2023-05-12T11:29:13Z) - Hey ASR System! Why Aren't You More Inclusive? Automatic Speech
Recognition Systems' Bias and Proposed Bias Mitigation Techniques. A
Literature Review [0.0]
We present research that addresses ASR biases against gender, race, and the sick and disabled.
We also discuss techniques for designing a more accessible and inclusive ASR technology.
arXiv Detail & Related papers (2022-11-17T13:15:58Z) - End-to-end contextual asr based on posterior distribution adaptation for
hybrid ctc/attention system [61.148549738631814]
End-to-end (E2E) speech recognition architectures assemble all components of traditional speech recognition system into a single model.
Although it simplifies ASR system, it introduces contextual ASR drawback: the E2E model has worse performance on utterances containing infrequent proper nouns.
We propose to add a contextual bias attention (CBA) module to attention based encoder decoder (AED) model to improve its ability of recognizing the contextual phrases.
arXiv Detail & Related papers (2022-02-18T03:26:02Z) - Language Dependencies in Adversarial Attacks on Speech Recognition
Systems [0.0]
We compare the attackability of a German and an English ASR system.
We investigate if one of the language models is more susceptible to manipulations than the other.
arXiv Detail & Related papers (2022-02-01T13:27:40Z) - Sequence-level self-learning with multiple hypotheses [53.04725240411895]
We develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR)
In contrast to conventional unsupervised learning approaches, we adopt the emphmulti-task learning (MTL) framework.
Our experiment results show that our method can reduce the WER on the British speech data from 14.55% to 10.36% compared to the baseline model trained with the US English data only.
arXiv Detail & Related papers (2021-12-10T20:47:58Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.