Accented Speech Recognition With Accent-specific Codebooks
- URL: http://arxiv.org/abs/2310.15970v3
- Date: Fri, 27 Oct 2023 02:54:29 GMT
- Title: Accented Speech Recognition With Accent-specific Codebooks
- Authors: Darshan Prabhu, Preethi Jyothi, Sriram Ganapathy, Vinit Unni
- Abstract summary: Speech accents pose a significant challenge to state-of-the-art automatic speech recognition (ASR) systems.
Degradation in performance across underrepresented accents is a severe deterrent to the inclusive adoption of ASR.
We propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks.
- Score: 53.288874858671576
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Speech accents pose a significant challenge to state-of-the-art automatic
speech recognition (ASR) systems. Degradation in performance across
underrepresented accents is a severe deterrent to the inclusive adoption of
ASR. In this work, we propose a novel accent adaptation approach for end-to-end
ASR systems using cross-attention with a trainable set of codebooks. These
learnable codebooks capture accent-specific information and are integrated
within the ASR encoder layers. The model is trained on accented English speech,
while the test data also contained accents which were not seen during training.
On the Mozilla Common Voice multi-accented dataset, we show that our proposed
approach yields significant performance gains not only on the seen English
accents (up to $37\%$ relative improvement in word error rate) but also on the
unseen accents (up to $5\%$ relative improvement in WER). Further, we
illustrate benefits for a zero-shot transfer setup on the L2Artic dataset. We
also compare the performance with other approaches based on accent adversarial
training.
Related papers
- Accent conversion using discrete units with parallel data synthesized from controllable accented TTS [56.18382038512251]
The goal of accent conversion (AC) is to convert speech accents while preserving content and speaker identity.
Previous methods either required reference utterances during inference, did not preserve speaker identity well, or used one-to-one systems that could only be trained for each non-native accent.
This paper presents a promising AC model that can convert many accents into native to overcome these issues.
arXiv Detail & Related papers (2024-09-30T19:52:10Z) - Improving Self-supervised Pre-training using Accent-Specific Codebooks [48.409296549372414]
accent-aware adaptation technique for self-supervised learning.
On the Mozilla Common Voice dataset, our proposed approach outperforms all other accent-adaptation approaches.
arXiv Detail & Related papers (2024-07-04T08:33:52Z) - AccentFold: A Journey through African Accents for Zero-Shot ASR
Adaptation to Target Accents [5.746007214645182]
We propose AccentFold, a method that exploits spatial relationships between learned accent embeddings to improve Automatic Speech Recognition (ASR)
Our exploratory analysis of speech embeddings representing 100+ African accents reveals interesting spatial accent relationships.
Our findings emphasize the potential of leveraging linguistic relationships to improve ASR adaptation to target accents.
arXiv Detail & Related papers (2024-02-02T05:38:59Z) - Exploring the Role of Audio in Video Captioning [59.679122191706426]
We present an audio-visual framework, which aims to fully exploit the potential of the audio modality for captioning.
We propose new local-global fusion mechanisms to improve information exchange across audio and video.
arXiv Detail & Related papers (2023-06-21T20:54:52Z) - Synthetic Cross-accent Data Augmentation for Automatic Speech
Recognition [18.154258453839066]
We improve an accent-conversion model (ACM) which transforms native US-English speech into accented pronunciation.
We include phonetic knowledge in the ACM training to provide accurate feedback about how well certain pronunciation patterns were recovered in the synthesized waveform.
We evaluate our approach on native and non-native English datasets and found that synthetically accented data helped the ASR to better understand speech from seen accents.
arXiv Detail & Related papers (2023-03-01T20:05:19Z) - VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised
Speech Representation Disentanglement for One-shot Voice Conversion [54.29557210925752]
One-shot voice conversion can be effectively achieved by speech representation disentanglement.
We employ vector quantization (VQ) for content encoding and introduce mutual information (MI) as the correlation metric during training.
Experimental results reflect the superiority of the proposed method in learning effective disentangled speech representations.
arXiv Detail & Related papers (2021-06-18T13:50:38Z) - Black-box Adaptation of ASR for Accented Speech [52.63060669715216]
We introduce the problem of adapting a black-box, cloud-based ASR system to speech from a target accent.
We propose a novel coupling of an open-source accent-tuned local model with the black-box service.
Our fine-grained merging algorithm is better at fixing accent errors than existing word-level combination strategies.
arXiv Detail & Related papers (2020-06-24T07:07:49Z) - Improving Accent Conversion with Reference Encoder and End-To-End
Text-To-Speech [23.30022534796909]
Accent conversion (AC) transforms a non-native speaker's accent into a native accent while maintaining the speaker's voice timbre.
We propose approaches to improving accent conversion applicability, as well as quality.
arXiv Detail & Related papers (2020-05-19T08:09:58Z) - AccentDB: A Database of Non-Native English Accents to Assist Neural
Speech Recognition [3.028098724882708]
We first spell out the key requirements for creating a well-curated database of speech samples in non-native accents for training and testing robust ASR systems.
We then introduce AccentDB, one such database that contains samples of 4 Indian-English accents collected by us.
We present several accent classification models and evaluate them thoroughly against human-labelled accent classes.
arXiv Detail & Related papers (2020-05-16T12:38:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.