Improving Accented Speech Recognition with Multi-Domain Training
- URL: http://arxiv.org/abs/2303.07924v1
- Date: Tue, 14 Mar 2023 14:10:16 GMT
- Title: Improving Accented Speech Recognition with Multi-Domain Training
- Authors: Lucas Maison, Yannick Est\`eve
- Abstract summary: We use speech audio representing four different French accents to create fine-tuning datasets that improve the robustness of pre-trained ASR models.
Our numerical experiments show that we can reduce error rates by up to 25% (relative) on African and Belgian accents.
- Score: 2.28438857884398
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Thanks to the rise of self-supervised learning, automatic speech recognition
(ASR) systems now achieve near-human performance on a wide variety of datasets.
However, they still lack generalization capability and are not robust to domain
shifts like accent variations. In this work, we use speech audio representing
four different French accents to create fine-tuning datasets that improve the
robustness of pre-trained ASR models. By incorporating various accents in the
training set, we obtain both in-domain and out-of-domain improvements. Our
numerical experiments show that we can reduce error rates by up to 25%
(relative) on African and Belgian accents compared to single-domain training
while keeping a good performance on standard French.
Related papers
- Improving Self-supervised Pre-training using Accent-Specific Codebooks [48.409296549372414]
accent-aware adaptation technique for self-supervised learning.
On the Mozilla Common Voice dataset, our proposed approach outperforms all other accent-adaptation approaches.
arXiv Detail & Related papers (2024-07-04T08:33:52Z) - Accented Speech Recognition With Accent-specific Codebooks [53.288874858671576]
Speech accents pose a significant challenge to state-of-the-art automatic speech recognition (ASR) systems.
Degradation in performance across underrepresented accents is a severe deterrent to the inclusive adoption of ASR.
We propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks.
arXiv Detail & Related papers (2023-10-24T16:10:58Z) - Improving Speech Recognition for African American English With Audio
Classification [17.785482810741367]
We propose a new way to improve the robustness of a US English short-form speech recognizer using a small amount of out-of-domain data.
Fine-tuning on this data results in a 38.5% relative word error rate disparity reduction between AAE and MAE without reducing MAE quality.
arXiv Detail & Related papers (2023-09-16T19:57:45Z) - A meta learning scheme for fast accent domain expansion in Mandarin
speech recognition [22.126817828698563]
Spoken languages show significant variation across mandarin and accent.
Despite the high performance of mandarin automatic speech recognition (ASR), accent ASR is still a challenge task.
We introduce meta-learning techniques for fast accent domain expansion in mandarin speech recognition.
arXiv Detail & Related papers (2023-07-23T08:23:26Z) - Replay to Remember: Continual Layer-Specific Fine-tuning for German
Speech Recognition [19.635428830237842]
We study how well the performance of large-scale ASR models can be approximated for smaller domains.
We apply Experience Replay for continual learning to increase the robustness of the ASR model to vocabulary and speakers outside of the fine-tuned domain.
arXiv Detail & Related papers (2023-07-14T11:20:22Z) - OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality
Alignment [57.15449072423539]
We propose a training system Open-modality Speech Recognition (textbfOpenSR)
OpenSR enables modality transfer from one to any in three different settings.
It achieves highly competitive zero-shot performance compared to the existing few-shot and full-shot lip-reading methods.
arXiv Detail & Related papers (2023-06-10T11:04:10Z) - From English to More Languages: Parameter-Efficient Model Reprogramming
for Cross-Lingual Speech Recognition [50.93943755401025]
We propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition.
We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement.
Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses.
arXiv Detail & Related papers (2023-01-19T02:37:56Z) - Sequence-level self-learning with multiple hypotheses [53.04725240411895]
We develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR)
In contrast to conventional unsupervised learning approaches, we adopt the emphmulti-task learning (MTL) framework.
Our experiment results show that our method can reduce the WER on the British speech data from 14.55% to 10.36% compared to the baseline model trained with the US English data only.
arXiv Detail & Related papers (2021-12-10T20:47:58Z) - Improving low-resource ASR performance with untranscribed out-of-domain
data [8.376091455761259]
Semi-supervised training (SST) is a common approach to leverage untranscribed/unlabeled speech data.
We look to improve performance on conversational/telephony speech (target domain) using web resources.
arXiv Detail & Related papers (2021-06-02T15:23:34Z) - Black-box Adaptation of ASR for Accented Speech [52.63060669715216]
We introduce the problem of adapting a black-box, cloud-based ASR system to speech from a target accent.
We propose a novel coupling of an open-source accent-tuned local model with the black-box service.
Our fine-grained merging algorithm is better at fixing accent errors than existing word-level combination strategies.
arXiv Detail & Related papers (2020-06-24T07:07:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.