Hey ASR System! Why Aren't You More Inclusive? Automatic Speech
Recognition Systems' Bias and Proposed Bias Mitigation Techniques. A
Literature Review
- URL: http://arxiv.org/abs/2211.09511v1
- Date: Thu, 17 Nov 2022 13:15:58 GMT
- Title: Hey ASR System! Why Aren't You More Inclusive? Automatic Speech
Recognition Systems' Bias and Proposed Bias Mitigation Techniques. A
Literature Review
- Authors: Mikel K. Ngueajio and Gloria Washington
- Abstract summary: We present research that addresses ASR biases against gender, race, and the sick and disabled.
We also discuss techniques for designing a more accessible and inclusive ASR technology.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Speech is the fundamental means of communication between humans. The advent
of AI and sophisticated speech technologies have led to the rapid proliferation
of human-to-computer-based interactions, fueled primarily by Automatic Speech
Recognition (ASR) systems. ASR systems normally take human speech in the form
of audio and convert it into words, but for some users, it cannot decode the
speech, and any output text is filled with errors that are incomprehensible to
the human reader. These systems do not work equally for everyone and actually
hinder the productivity of some users. In this paper, we present research that
addresses ASR biases against gender, race, and the sick and disabled, while
exploring studies that propose ASR debiasing techniques for mitigating these
discriminations. We also discuss techniques for designing a more accessible and
inclusive ASR technology. For each approach surveyed, we also provide a summary
of the investigation and methods applied, the ASR systems and corpora used, and
the research findings, and highlight their strengths and/or weaknesses.
Finally, we propose future opportunities for Natural Language Processing
researchers to explore in the next level creation of ASR technologies.
Related papers
- SONAR: A Synthetic AI-Audio Detection Framework and Benchmark [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.
It aims to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.
It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z) - Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
Most languages lack sufficient paired speech and text data to effectively train automatic speech recognition systems.
We propose the removal of reliance on a phoneme lexicon to develop unsupervised ASR systems.
We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.
arXiv Detail & Related papers (2024-06-12T16:30:58Z) - Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques [17.166092544686553]
This study benchmarks Speech Emotion Recognition using ASR transcripts with varying Word Error Rates (WERs) from eleven models on three well-known corpora.
We propose a unified ASR error-robust framework integrating ASR error correction and modality-gated fusion, achieving lower WER and higher SER results compared to the best-performing ASR transcript.
arXiv Detail & Related papers (2024-06-12T15:59:25Z) - A Vector Quantized Approach for Text to Speech Synthesis on Real-World
Spontaneous Speech [94.64927912924087]
We train TTS systems using real-world speech from YouTube and podcasts.
Recent Text-to-Speech architecture is designed for multiple code generation and monotonic alignment.
We show thatRecent Text-to-Speech architecture outperforms existing TTS systems in several objective and subjective measures.
arXiv Detail & Related papers (2023-02-08T17:34:32Z) - Speech Aware Dialog System Technology Challenge (DSTC11) [12.841429336655736]
Most research on task oriented dialog modeling is based on written text input.
We created three spoken versions of the popular written-domain MultiWoz task -- (a) TTS-Verbatim: written user inputs were converted into speech waveforms using a TTS system, (b) Human-Verbatim: humans spoke the user inputs verbatim, and (c) Human-paraphrased: humans paraphrased the user inputs.
arXiv Detail & Related papers (2022-12-16T20:30:33Z) - Contextual-Utterance Training for Automatic Speech Recognition [65.4571135368178]
We propose a contextual-utterance training technique which makes use of the previous and future contextual utterances.
Also, we propose a dual-mode contextual-utterance training technique for streaming automatic speech recognition (ASR) systems.
The proposed technique is able to reduce both the WER and the average last token emission latency by more than 6% and 40ms relative.
arXiv Detail & Related papers (2022-10-27T08:10:44Z) - Can Visual Context Improve Automatic Speech Recognition for an Embodied
Agent? [3.7311680121118345]
We propose a new decoder biasing technique to incorporate the visual context while ensuring the ASR output does not degrade for incorrect context.
We achieve a 59% relative reduction in WER from an unmodified ASR system.
arXiv Detail & Related papers (2022-10-21T11:16:05Z) - Neural Approaches to Conversational Information Retrieval [94.77863916314979]
A conversational information retrieval (CIR) system is an information retrieval (IR) system with a conversational interface.
Recent progress in deep learning has brought tremendous improvements in natural language processing (NLP) and conversational AI.
This book surveys recent advances in CIR, focusing on neural approaches that have been developed in the last few years.
arXiv Detail & Related papers (2022-01-13T19:04:59Z) - Automatic Speech Recognition using limited vocabulary: A survey [0.0]
An approach to design an ASR system targeting under-resourced languages is to start with a limited vocabulary.
This paper aims to provide a comprehensive view of mechanisms behind ASR systems as well as techniques, tools, projects, recent contributions, and possibly future directions in ASR using a limited vocabulary.
arXiv Detail & Related papers (2021-08-23T15:51:41Z) - Quantifying Bias in Automatic Speech Recognition [28.301997555189462]
This paper quantifies the bias of a Dutch SotA ASR system against gender, age, regional accents and non-native accents.
Based on our findings, we suggest bias mitigation strategies for ASR development.
arXiv Detail & Related papers (2021-03-28T12:52:03Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.