A meta learning scheme for fast accent domain expansion in Mandarin
speech recognition
- URL: http://arxiv.org/abs/2307.12262v1
- Date: Sun, 23 Jul 2023 08:23:26 GMT
- Title: A meta learning scheme for fast accent domain expansion in Mandarin
speech recognition
- Authors: Ziwei Zhu, Changhao Shan, Bihong Zhang, Jian Yu
- Abstract summary: Spoken languages show significant variation across mandarin and accent.
Despite the high performance of mandarin automatic speech recognition (ASR), accent ASR is still a challenge task.
We introduce meta-learning techniques for fast accent domain expansion in mandarin speech recognition.
- Score: 22.126817828698563
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spoken languages show significant variation across mandarin and accent.
Despite the high performance of mandarin automatic speech recognition (ASR),
accent ASR is still a challenge task. In this paper, we introduce meta-learning
techniques for fast accent domain expansion in mandarin speech recognition,
which expands the field of accents without deteriorating the performance of
mandarin ASR. Meta-learning or learn-to-learn can learn general relation in
multi domains not only for over-fitting a specific domain. So we select
meta-learning in the domain expansion task. This more essential learning will
cause improved performance on accent domain extension tasks. We combine the
methods of meta learning and freeze of model parameters, which makes the
recognition performance more stable in different cases and the training faster
about 20%. Our approach significantly outperforms other methods about 3%
relatively in the accent domain expansion task. Compared to the baseline model,
it improves relatively 37% under the condition that the mandarin test set
remains unchanged. In addition, it also proved this method to be effective on a
large amount of data with a relative performance improvement of 4% on the
accent test set.
Related papers
- A Cross-Lingual Meta-Learning Method Based on Domain Adaptation for Speech Emotion Recognition [1.8377902806196766]
Best-performing speech models are trained on large amounts of data in the language they are meant to work for.
Most languages have sparse data, making training models challenging.
Our work explores the model's performance in limited data, specifically for speech emotion recognition.
arXiv Detail & Related papers (2024-10-06T21:33:51Z) - Rethinking and Improving Multi-task Learning for End-to-end Speech
Translation [51.713683037303035]
We investigate the consistency between different tasks, considering different times and modules.
We find that the textual encoder primarily facilitates cross-modal conversion, but the presence of noise in speech impedes the consistency between text and speech representations.
We propose an improved multi-task learning (IMTL) approach for the ST task, which bridges the modal gap by mitigating the difference in length and representation.
arXiv Detail & Related papers (2023-11-07T08:48:46Z) - Accented Speech Recognition With Accent-specific Codebooks [53.288874858671576]
Speech accents pose a significant challenge to state-of-the-art automatic speech recognition (ASR) systems.
Degradation in performance across underrepresented accents is a severe deterrent to the inclusive adoption of ASR.
We propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks.
arXiv Detail & Related papers (2023-10-24T16:10:58Z) - SememeASR: Boosting Performance of End-to-End Speech Recognition against
Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge [58.979490858061745]
We introduce sememe-based semantic knowledge information to speech recognition.
Our experiments show that sememe information can improve the effectiveness of speech recognition.
In addition, our further experiments show that sememe knowledge can improve the model's recognition of long-tailed data.
arXiv Detail & Related papers (2023-09-04T08:35:05Z) - Replay to Remember: Continual Layer-Specific Fine-tuning for German
Speech Recognition [19.635428830237842]
We study how well the performance of large-scale ASR models can be approximated for smaller domains.
We apply Experience Replay for continual learning to increase the robustness of the ASR model to vocabulary and speakers outside of the fine-tuned domain.
arXiv Detail & Related papers (2023-07-14T11:20:22Z) - Improving Accented Speech Recognition with Multi-Domain Training [2.28438857884398]
We use speech audio representing four different French accents to create fine-tuning datasets that improve the robustness of pre-trained ASR models.
Our numerical experiments show that we can reduce error rates by up to 25% (relative) on African and Belgian accents.
arXiv Detail & Related papers (2023-03-14T14:10:16Z) - Persian Natural Language Inference: A Meta-learning approach [6.832341432995628]
This paper proposes a meta-learning approach for inferring natural language in Persian.
We evaluate the proposed method using four languages and an auxiliary task.
arXiv Detail & Related papers (2022-05-18T06:51:58Z) - Cross-lingual Transfer for Speech Processing using Acoustic Language
Similarity [81.51206991542242]
Cross-lingual transfer offers a compelling way to help bridge this digital divide.
Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-resource languages.
We propose a language similarity approach that can efficiently identify acoustic cross-lingual transfer pairs across hundreds of languages.
arXiv Detail & Related papers (2021-11-02T01:55:17Z) - Multilingual Speech Recognition using Knowledge Transfer across Learning
Processes [15.927513451432946]
Experimental results reveal the best pre-training strategy resulting in 3.55% relative reduction in overall WER.
A combination of LEAP and SSL yields 3.51% relative reduction in overall WER when using language ID.
arXiv Detail & Related papers (2021-10-15T07:50:27Z) - XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation [93.80733419450225]
This paper analyzes the current state of cross-lingual transfer learning.
We extend XTREME to XTREME-R, which consists of an improved set of ten natural language understanding tasks.
arXiv Detail & Related papers (2021-04-15T12:26:12Z) - Meta-Transfer Learning for Code-Switched Speech Recognition [72.84247387728999]
We propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting.
Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data.
arXiv Detail & Related papers (2020-04-29T14:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.