Data Augmentation for Low-Resource Quechua ASR Improvement
- URL: http://arxiv.org/abs/2207.06872v1
- Date: Thu, 14 Jul 2022 12:49:15 GMT
- Title: Data Augmentation for Low-Resource Quechua ASR Improvement
- Authors: Rodolfo Zevallos, Nuria Bel, Guillermo C\'ambara, Mireia Farr\'us and
Jordi Luque
- Abstract summary: Deep learning methods have made it possible to deploy systems with word error rates below 5% for ASR of English.
For so-called low-resource languages, methods of creating new resources on the basis of existing ones are being investigated.
We describe our data augmentation approach to improve the results of ASR models for low-resource and agglutinative languages.
- Score: 2.260916274164351
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic Speech Recognition (ASR) is a key element in new services that
helps users to interact with an automated system. Deep learning methods have
made it possible to deploy systems with word error rates below 5% for ASR of
English. However, the use of these methods is only available for languages with
hundreds or thousands of hours of audio and their corresponding transcriptions.
For the so-called low-resource languages to speed up the availability of
resources that can improve the performance of their ASR systems, methods of
creating new resources on the basis of existing ones are being investigated. In
this paper we describe our data augmentation approach to improve the results of
ASR models for low-resource and agglutinative languages. We carry out
experiments developing an ASR for Quechua using the wav2letter++ model. We
reduced WER by 8.73% through our approach to the base model. The resulting ASR
model obtained 22.75% WER and was trained with 99 hours of original resources
and 99 hours of synthetic data obtained with a combination of text augmentation
and synthetic speech generati
Related papers
- Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach [0.6445605125467574]
This study introduces a novel pipeline designed to generate ASR training datasets from audiobooks.
The common structure of these audiobooks poses a unique challenge due to the extensive length of audio segments.
We propose a method for effectively aligning audio with its corresponding text and segmenting it into lengths suitable for ASR training.
arXiv Detail & Related papers (2024-06-03T15:38:40Z) - XLS-R Deep Learning Model for Multilingual ASR on Low- Resource
Languages: Indonesian, Javanese, and Sundanese [0.0]
The study aims to improve ASR performance in converting spoken language into written text, specifically for Indonesian, Javanese, and Sundanese languages.
The results show that the XLS-R 300m model achieves competitive Word Error Rate (WER) measurements, with a slight compromise in performance for Javanese and Sundanese languages.
arXiv Detail & Related papers (2024-01-12T13:44:48Z) - Learning Cross-lingual Mappings for Data Augmentation to Improve
Low-Resource Speech Recognition [31.575930914290762]
Exploiting cross-lingual resources is an effective way to compensate for data scarcity of low resource languages.
We extend the concept of learnable cross-lingual mappings for end-to-end speech recognition.
The results show that any source language ASR model can be used for a low-resource target language recognition.
arXiv Detail & Related papers (2023-06-14T15:24:31Z) - Strategies for improving low resource speech to text translation relying
on pre-trained ASR models [59.90106959717875]
This paper presents techniques and findings for improving the performance of low-resource speech to text translation (ST)
We conducted experiments on both simulated and real-low resource setups, on language pairs English - Portuguese, and Tamasheq - French respectively.
arXiv Detail & Related papers (2023-05-31T21:58:07Z) - Making More of Little Data: Improving Low-Resource Automatic Speech
Recognition Using Data Augmentation [20.45373308116162]
This study focuses on four typologically diverse minority languages or language variants (West Germanic: Gronings, West-Frisian; Malayo-Polynesian: Besemah, Nasal).
For all four languages, we examine the use of self-training, where an ASR system trained with the available human-transcribed data is used to generate transcriptions, which are then combined with the original data to train a new ASR system.
We find that using a self-training approach consistently yields improved performance (a relative WER reduction up to 20.5% compared to using an ASR system trained on 24 minutes of
arXiv Detail & Related papers (2023-05-18T13:20:38Z) - From English to More Languages: Parameter-Efficient Model Reprogramming
for Cross-Lingual Speech Recognition [50.93943755401025]
We propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition.
We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement.
Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses.
arXiv Detail & Related papers (2023-01-19T02:37:56Z) - Neural Model Reprogramming with Similarity Based Mapping for
Low-Resource Spoken Command Recognition [71.96870151495536]
We propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR)
The AR procedure aims to modify the acoustic signals (from the target domain) to repurpose a pretrained SCR model.
We evaluate the proposed AR-SCR system on three low-resource SCR datasets, including Arabic, Lithuanian, and dysarthric Mandarin speech.
arXiv Detail & Related papers (2021-10-08T05:07:35Z) - Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource
End-to-End Speech Recognition [62.94773371761236]
We consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate.
We propose a method of dynamic acoustic unit augmentation based on the BPE-dropout technique.
Our monolingual Turkish Conformer established a competitive result with 22.2% character error rate (CER) and 38.9% word error rate (WER)
arXiv Detail & Related papers (2021-03-12T10:10:13Z) - Bootstrap an end-to-end ASR system by multilingual training, transfer
learning, text-to-text mapping and synthetic audio [8.510792628268824]
bootstrapping speech recognition on limited data resources has been an area of active research for long.
We investigate here the effectiveness of different strategies to bootstrap an RNN-Transducer based automatic speech recognition (ASR) system in the low resource regime.
Our experiments demonstrate that transfer learning from a multilingual model, using a post-ASR text-to-text mapping and synthetic audio deliver additive improvements.
arXiv Detail & Related papers (2020-11-25T13:11:32Z) - LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition [148.43282526983637]
We develop LRSpeech, a TTS and ASR system for languages with low data cost.
We conduct experiments on an experimental language (English) and a truly low-resource language (Lithuanian) to verify the effectiveness of LRSpeech.
We are currently deploying LRSpeech into a commercialized cloud speech service to support TTS on more rare languages.
arXiv Detail & Related papers (2020-08-09T08:16:33Z) - Improving Cross-Lingual Transfer Learning for End-to-End Speech
Recognition with Speech Translation [63.16500026845157]
We introduce speech-to-text translation as an auxiliary task to incorporate additional knowledge of the target language.
We show that training ST with human translations is not necessary.
Even with pseudo-labels from low-resource MT (200K examples), ST-enhanced transfer brings up to 8.9% WER reduction to direct transfer.
arXiv Detail & Related papers (2020-06-09T19:34:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.