A Novel Self-training Approach for Low-resource Speech Recognition
- URL: http://arxiv.org/abs/2308.05269v1
- Date: Thu, 10 Aug 2023 01:02:45 GMT
- Title: A Novel Self-training Approach for Low-resource Speech Recognition
- Authors: Satwinder Singh and Feng Hou and Ruili Wang
- Abstract summary: We propose a self-training approach for automatic speech recognition (ASR) for low-resource settings.
Our approach significantly improves word error rate, achieving a relative improvement of 14.94%.
Our proposed approach reports the best results on the Common Voice Punjabi dataset.
- Score: 15.612232220719653
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper, we propose a self-training approach for automatic speech
recognition (ASR) for low-resource settings. While self-training approaches
have been extensively developed and evaluated for high-resource languages such
as English, their applications to low-resource languages like Punjabi have been
limited, despite the language being spoken by millions globally. The scarcity
of annotated data has hindered the development of accurate ASR systems,
especially for low-resource languages (e.g., Punjabi and M\=aori languages). To
address this issue, we propose an effective self-training approach that
generates highly accurate pseudo-labels for unlabeled low-resource speech. Our
experimental analysis demonstrates that our approach significantly improves
word error rate, achieving a relative improvement of 14.94% compared to a
baseline model across four real speech datasets. Further, our proposed approach
reports the best results on the Common Voice Punjabi dataset.
Related papers
- How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages? [35.368257850926184]
We focus on a zero-shot evaluation setting focusing on low-resource Indian languages, namely Assamese, Kannada, Maithili, and Punjabi.
We observe that even for learned metrics, which are known to exhibit zero-shot performance, the Kendall Tau and Pearson correlations with human annotations are only as high as 0.32 and 0.45.
arXiv Detail & Related papers (2024-06-06T09:28:08Z) - Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach [0.6445605125467574]
This study introduces a novel pipeline designed to generate ASR training datasets from audiobooks.
The common structure of these audiobooks poses a unique challenge due to the extensive length of audio segments.
We propose a method for effectively aligning audio with its corresponding text and segmenting it into lengths suitable for ASR training.
arXiv Detail & Related papers (2024-06-03T15:38:40Z) - Cross-Lingual NER for Financial Transaction Data in Low-Resource
Languages [70.25418443146435]
We propose an efficient modeling framework for cross-lingual named entity recognition in semi-structured text data.
We employ two independent datasets of SMSs in English and Arabic, each carrying semi-structured banking transaction information.
With access to only 30 labeled samples, our model can generalize the recognition of merchants, amounts, and other fields from English to Arabic.
arXiv Detail & Related papers (2023-07-16T00:45:42Z) - Advancing African-Accented Speech Recognition: Epistemic Uncertainty-Driven Data Selection for Generalizable ASR Models [2.4654745083407175]
We propose a new multi-rounds adaptation process that uses uncertainty to automate the annotation process.
This novel method streamlines data annotation and strategically selects data samples contributing most to model uncertainty.
Our results show that our approach leads to a 27% WER relative average improvement while requiring on average 45% less data than established baselines.
arXiv Detail & Related papers (2023-06-03T13:11:37Z) - Language-universal phonetic encoder for low-resource speech recognition [28.21805271848413]
We leverage International Phonetic Alphabet (IPA) based language-universal phonetic model to improve low-resource ASR performances.
Our approach and adaptation are effective on extremely low-resource languages, even within domain- and language-mismatched scenarios.
arXiv Detail & Related papers (2023-05-19T10:24:30Z) - From English to More Languages: Parameter-Efficient Model Reprogramming
for Cross-Lingual Speech Recognition [50.93943755401025]
We propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition.
We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement.
Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses.
arXiv Detail & Related papers (2023-01-19T02:37:56Z) - Transfer Learning based Speech Affect Recognition in Urdu [0.0]
We pre-train a model for high resource language affect recognition task and fine tune the parameters for low resource language.
This approach achieves high Unweighted Average Recall (UAR) when compared with existing algorithms.
arXiv Detail & Related papers (2021-03-05T10:30:58Z) - Unsupervised Cross-lingual Representation Learning for Speech
Recognition [63.85924123692923]
XLSR learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.
We build on wav2vec 2.0 which is trained by solving a contrastive task over masked latent speech representations.
Experiments show that cross-lingual pretraining significantly outperforms monolingual pretraining.
arXiv Detail & Related papers (2020-06-24T18:25:05Z) - Building Low-Resource NER Models Using Non-Speaker Annotation [58.78968578460793]
Cross-lingual methods have had notable success in addressing these concerns.
We propose a complementary approach to building low-resource Named Entity Recognition (NER) models using non-speaker'' (NS) annotations.
We show that use of NS annotators produces results that are consistently on par or better than cross-lingual methods built on modern contextual representations.
arXiv Detail & Related papers (2020-06-17T03:24:38Z) - Improving Cross-Lingual Transfer Learning for End-to-End Speech
Recognition with Speech Translation [63.16500026845157]
We introduce speech-to-text translation as an auxiliary task to incorporate additional knowledge of the target language.
We show that training ST with human translations is not necessary.
Even with pseudo-labels from low-resource MT (200K examples), ST-enhanced transfer brings up to 8.9% WER reduction to direct transfer.
arXiv Detail & Related papers (2020-06-09T19:34:11Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.