Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages
- URL: http://arxiv.org/abs/2409.08872v1
- Date: Fri, 13 Sep 2024 14:35:47 GMT
- Title: Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages
- Authors: Yao-Fei Cheng, Li-Wei Chen, Hung-Shin Lee, Hsin-Min Wang,
- Abstract summary: This study focuses on two endangered Austronesian languages, Amis and Seediq.
We propose a novel data-selection scheme leveraging a multilingual corpus to augment the limited target language data.
- Score: 24.856817602140193
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study investigates the efficacy of data augmentation techniques for low-resource automatic speech recognition (ASR), focusing on two endangered Austronesian languages, Amis and Seediq. Recognizing the potential of self-supervised learning (SSL) in low-resource settings, we explore the impact of data volume on the continued pre-training of SSL models. We propose a novel data-selection scheme leveraging a multilingual corpus to augment the limited target language data. This scheme utilizes a language classifier to extract utterance embeddings and employs one-class classifiers to identify utterances phonetically and phonologically proximate to the target languages. Utterances are ranked and selected based on their decision scores, ensuring the inclusion of highly relevant data in the SSL-ASR pipeline. Our experimental results demonstrate the effectiveness of this approach, yielding substantial improvements in ASR performance for both Amis and Seediq. These findings underscore the feasibility and promise of data augmentation through cross-lingual transfer learning for low-resource language ASR.
Related papers
- Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection [49.27067541740956]
Speech Emotion Recognition (SER) is a crucial component in developing general-purpose AI agents capable of natural human-computer interaction.
Building robust multilingual SER systems remains challenging due to the scarcity of labeled data in languages other than English and Chinese.
We propose an approach to enhance SER performance in low SER resource languages by leveraging data from high-resource languages.
arXiv Detail & Related papers (2024-09-17T08:36:45Z) - Meta-Whisper: Speech-Based Meta-ICL for ASR on Low-Resource Languages [51.12146889808824]
Meta-Whisper is a novel approach to improve automatic speech recognition for low-resource languages.
It enhances Whisper's ability to recognize speech in unfamiliar languages without extensive fine-tuning.
arXiv Detail & Related papers (2024-09-16T16:04:16Z) - An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios [76.11409260727459]
This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system.
We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance.
arXiv Detail & Related papers (2024-06-13T08:16:52Z) - LLMs in the Loop: Leveraging Large Language Model Annotations for Active Learning in Low-Resource Languages [1.149936119867417]
Low-resource languages face significant barriers in AI development due to limited linguistic resources and expertise for data labeling.
We propose leveraging the potential of LLMs in the active learning loop for data annotation.
Empirical evaluations, notably employing GPT-4-Turbo, demonstrate near-state-of-the-art performance with significantly reduced data requirements.
arXiv Detail & Related papers (2024-04-02T19:34:22Z) - Large Language Models are Efficient Learners of Noise-Robust Speech
Recognition [65.95847272465124]
Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR)
In this work, we extend the benchmark to noisy conditions and investigate if we can teach LLMs to perform denoising for GER.
Experiments on various latest LLMs demonstrate our approach achieves a new breakthrough with up to 53.9% correction improvement in terms of word error rate.
arXiv Detail & Related papers (2024-01-19T01:29:27Z) - Reduce, Reuse, Recycle: Is Perturbed Data better than Other Language augmentation for Low Resource Self-Supervised Speech Models [48.44820587495038]
Self-supervised representation learning (SSRL) has demonstrated superior performance than supervised models for tasks including phoneme recognition.
Training SSRL models poses a challenge for low-resource languages where sufficient pre-training data may not be available.
We propose to use audio augmentation techniques, namely: pitch variation, noise addition, accented target language and other language speech to pre-train SSRL models in a low resource condition and evaluate phoneme recognition.
arXiv Detail & Related papers (2023-09-22T10:09:09Z) - A Novel Self-training Approach for Low-resource Speech Recognition [15.612232220719653]
We propose a self-training approach for automatic speech recognition (ASR) for low-resource settings.
Our approach significantly improves word error rate, achieving a relative improvement of 14.94%.
Our proposed approach reports the best results on the Common Voice Punjabi dataset.
arXiv Detail & Related papers (2023-08-10T01:02:45Z) - Learning Cross-lingual Mappings for Data Augmentation to Improve
Low-Resource Speech Recognition [31.575930914290762]
Exploiting cross-lingual resources is an effective way to compensate for data scarcity of low resource languages.
We extend the concept of learnable cross-lingual mappings for end-to-end speech recognition.
The results show that any source language ASR model can be used for a low-resource target language recognition.
arXiv Detail & Related papers (2023-06-14T15:24:31Z) - Multilingual Speech Recognition using Knowledge Transfer across Learning
Processes [15.927513451432946]
Experimental results reveal the best pre-training strategy resulting in 3.55% relative reduction in overall WER.
A combination of LEAP and SSL yields 3.51% relative reduction in overall WER when using language ID.
arXiv Detail & Related papers (2021-10-15T07:50:27Z) - Meta-Transfer Learning for Code-Switched Speech Recognition [72.84247387728999]
We propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting.
Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data.
arXiv Detail & Related papers (2020-04-29T14:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.