A Novel Self-training Approach for Low-resource Speech Recognition
- URL: http://arxiv.org/abs/2308.05269v1
- Date: Thu, 10 Aug 2023 01:02:45 GMT
- Title: A Novel Self-training Approach for Low-resource Speech Recognition
- Authors: Satwinder Singh and Feng Hou and Ruili Wang
- Abstract summary: We propose a self-training approach for automatic speech recognition (ASR) for low-resource settings.
Our approach significantly improves word error rate, achieving a relative improvement of 14.94%.
Our proposed approach reports the best results on the Common Voice Punjabi dataset.
- Score: 15.612232220719653
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper, we propose a self-training approach for automatic speech
recognition (ASR) for low-resource settings. While self-training approaches
have been extensively developed and evaluated for high-resource languages such
as English, their applications to low-resource languages like Punjabi have been
limited, despite the language being spoken by millions globally. The scarcity
of annotated data has hindered the development of accurate ASR systems,
especially for low-resource languages (e.g., Punjabi and M\=aori languages). To
address this issue, we propose an effective self-training approach that
generates highly accurate pseudo-labels for unlabeled low-resource speech. Our
experimental analysis demonstrates that our approach significantly improves
word error rate, achieving a relative improvement of 14.94% compared to a
baseline model across four real speech datasets. Further, our proposed approach
reports the best results on the Common Voice Punjabi dataset.
Related papers
- Multistage Fine-tuning Strategies for Automatic Speech Recognition in Low-resource Languages [0.0]
This paper presents a novel multistage fine-tuning strategy designed to enhance automatic speech recognition (ASR) performance in low-resource languages.
In this approach we aim to build ASR model for languages with limited digital resources by sequentially adapting the model across linguistically similar languages.
We experimented this on the Malasar language, a Dravidian language spoken by approximately ten thousand people in the Western Ghats of South India.
arXiv Detail & Related papers (2024-11-07T09:57:57Z) - Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking [68.77659513993507]
We present a simple and effective N-best re-ranking approach to improve multilingual ASR accuracy.
Our results show spoken language identification accuracy improvements of 8.7% and 6.1%, respectively, and word error rates which are 3.3% and 2.0% lower on these benchmarks.
arXiv Detail & Related papers (2024-09-27T03:31:32Z) - Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition [2.7247388777405597]
We introduce a novel application of weighted cross-entropy, typically used for unbalanced datasets.
We fine-tune the Whisper multilingual ASR model on five high-resource languages and one low-resource language.
arXiv Detail & Related papers (2024-09-25T14:09:09Z) - Meta-Whisper: Speech-Based Meta-ICL for ASR on Low-Resource Languages [51.12146889808824]
Meta-Whisper is a novel approach to improve automatic speech recognition for low-resource languages.
It enhances Whisper's ability to recognize speech in unfamiliar languages without extensive fine-tuning.
arXiv Detail & Related papers (2024-09-16T16:04:16Z) - Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages [24.856817602140193]
This study focuses on two endangered Austronesian languages, Amis and Seediq.
We propose a novel data-selection scheme leveraging a multilingual corpus to augment the limited target language data.
arXiv Detail & Related papers (2024-09-13T14:35:47Z) - Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach [0.6445605125467574]
This study introduces a novel pipeline designed to generate ASR training datasets from audiobooks.
The common structure of these audiobooks poses a unique challenge due to the extensive length of audio segments.
We propose a method for effectively aligning audio with its corresponding text and segmenting it into lengths suitable for ASR training.
arXiv Detail & Related papers (2024-06-03T15:38:40Z) - Language-universal phonetic encoder for low-resource speech recognition [28.21805271848413]
We leverage International Phonetic Alphabet (IPA) based language-universal phonetic model to improve low-resource ASR performances.
Our approach and adaptation are effective on extremely low-resource languages, even within domain- and language-mismatched scenarios.
arXiv Detail & Related papers (2023-05-19T10:24:30Z) - From English to More Languages: Parameter-Efficient Model Reprogramming
for Cross-Lingual Speech Recognition [50.93943755401025]
We propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition.
We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement.
Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses.
arXiv Detail & Related papers (2023-01-19T02:37:56Z) - Unsupervised Cross-lingual Representation Learning for Speech
Recognition [63.85924123692923]
XLSR learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.
We build on wav2vec 2.0 which is trained by solving a contrastive task over masked latent speech representations.
Experiments show that cross-lingual pretraining significantly outperforms monolingual pretraining.
arXiv Detail & Related papers (2020-06-24T18:25:05Z) - Building Low-Resource NER Models Using Non-Speaker Annotation [58.78968578460793]
Cross-lingual methods have had notable success in addressing these concerns.
We propose a complementary approach to building low-resource Named Entity Recognition (NER) models using non-speaker'' (NS) annotations.
We show that use of NS annotators produces results that are consistently on par or better than cross-lingual methods built on modern contextual representations.
arXiv Detail & Related papers (2020-06-17T03:24:38Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.