Accented Speech Recognition: A Survey
- URL: http://arxiv.org/abs/2104.10747v1
- Date: Wed, 21 Apr 2021 20:21:06 GMT
- Title: Accented Speech Recognition: A Survey
- Authors: Arthur Hinsvark (1), Natalie Delworth (1), Miguel Del Rio (1), Quinten
McNamara (1), Joshua Dong (1), Ryan Westerman (1), Michelle Huang (1), Joseph
Palakapilly (1), Jennifer Drexler (1), Ilya Pirkin (1), Nishchal Bhandari
(1), Miguel Jette (1) ((1) Rev.com)
- Abstract summary: We present a survey of current promising approaches to accented speech recognition.
The resulting bias in ASR performance across accents comes at a cost to both users and providers of ASR.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic Speech Recognition (ASR) systems generalize poorly on accented
speech. The phonetic and linguistic variability of accents present hard
challenges for ASR systems today in both data collection and modeling
strategies. The resulting bias in ASR performance across accents comes at a
cost to both users and providers of ASR.
We present a survey of current promising approaches to accented speech
recognition and highlight the key challenges in the space. Approaches mostly
focus on single model generalization and accent feature engineering. Among the
challenges, lack of a standard benchmark makes research and comparison
especially difficult.
Related papers
- ASR Benchmarking: Need for a More Representative Conversational Dataset [3.017953715883516]
We introduce a multilingual conversational dataset, derived from TalkBank, consisting of unstructured phone conversation between adults.
Our results show a significant performance drop across various state-of-the-art ASR models when tested in conversational settings.
arXiv Detail & Related papers (2024-09-18T15:03:04Z) - Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition [18.90193320368228]
We present accent clustering and mining schemes for fair speech recognition systems.
For accent recognition, we applied three schemes to overcome limited size of supervised accent data.
Fine-tuning ASR on the mined Indian accent speech showed 10.0% and 5.3% relative improvements compared to fine-tuning on the randomly sampled speech.
arXiv Detail & Related papers (2024-08-05T16:00:07Z) - Improving Self-supervised Pre-training using Accent-Specific Codebooks [48.409296549372414]
accent-aware adaptation technique for self-supervised learning.
On the Mozilla Common Voice dataset, our proposed approach outperforms all other accent-adaptation approaches.
arXiv Detail & Related papers (2024-07-04T08:33:52Z) - Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
Most languages lack sufficient paired speech and text data to effectively train automatic speech recognition systems.
We propose the removal of reliance on a phoneme lexicon to develop unsupervised ASR systems.
We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.
arXiv Detail & Related papers (2024-06-12T16:30:58Z) - A New Benchmark for Evaluating Automatic Speech Recognition in the Arabic Call Domain [0.0]
This work is an attempt to introduce a comprehensive benchmark for Arabic speech recognition, specifically tailored to address the challenges of telephone conversations in Arabic language.
Our work aims to establish a robust benchmark that not only encompasses the broad spectrum of Arabic dialects but also emulates the real-world conditions of call-based communications.
arXiv Detail & Related papers (2024-03-07T07:24:32Z) - MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition [62.89464258519723]
We propose a multi-layer cross-attention fusion based AVSR approach that promotes representation of each modality by fusing them at different levels of audio/visual encoders.
Our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset.
arXiv Detail & Related papers (2024-01-07T08:59:32Z) - Accented Speech Recognition With Accent-specific Codebooks [53.288874858671576]
Speech accents pose a significant challenge to state-of-the-art automatic speech recognition (ASR) systems.
Degradation in performance across underrepresented accents is a severe deterrent to the inclusive adoption of ASR.
We propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks.
arXiv Detail & Related papers (2023-10-24T16:10:58Z) - On the Impact of Speech Recognition Errors in Passage Retrieval for
Spoken Question Answering [13.013751306590303]
We study the robustness of lexical and dense retrievers against questions with synthetic ASR noise.
We create a new dataset with questions voiced by human users and use their transcriptions to show that the retrieval performance can further degrade when dealing with natural ASR noise instead of synthetic ASR noise.
arXiv Detail & Related papers (2022-09-26T18:29:36Z) - ASR data augmentation in low-resource settings using cross-lingual
multi-speaker TTS and cross-lingual voice conversion [49.617722668505834]
We show that our approach permits the application of speech synthesis and voice conversion to improve ASR systems using only one target-language speaker during model training.
It is possible to obtain promising ASR training results with our data augmentation method using only a single real speaker in a target language.
arXiv Detail & Related papers (2022-03-29T11:55:30Z) - Contextualized Attention-based Knowledge Transfer for Spoken
Conversational Question Answering [63.72278693825945]
Spoken conversational question answering (SCQA) requires machines to model complex dialogue flow.
We propose CADNet, a novel contextualized attention-based distillation approach.
We conduct extensive experiments on the Spoken-CoQA dataset and demonstrate that our approach achieves remarkable performance.
arXiv Detail & Related papers (2020-10-21T15:17:18Z) - AccentDB: A Database of Non-Native English Accents to Assist Neural
Speech Recognition [3.028098724882708]
We first spell out the key requirements for creating a well-curated database of speech samples in non-native accents for training and testing robust ASR systems.
We then introduce AccentDB, one such database that contains samples of 4 Indian-English accents collected by us.
We present several accent classification models and evaluate them thoroughly against human-labelled accent classes.
arXiv Detail & Related papers (2020-05-16T12:38:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.