Data Augmentation for Speech Recognition in Maltese: A Low-Resource
Perspective
- URL: http://arxiv.org/abs/2111.07793v1
- Date: Mon, 15 Nov 2021 14:28:21 GMT
- Title: Data Augmentation for Speech Recognition in Maltese: A Low-Resource
Perspective
- Authors: Carlos Mena and Andrea DeMarco and Claudia Borg and Lonneke van der
Plas and Albert Gatt
- Abstract summary: We consider data augmentation techniques for improving speech recognition in Maltese.
We consider three types of data augmentation: unsupervised training, multilingual training and the use of synthesized speech as training data.
Our results show that combining the three data augmentation techniques studied here lead us to an absolute WER improvement of 15% without the use of a language model.
- Score: 4.6898263272139795
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Developing speech technologies is a challenge for low-resource languages for
which both annotated and raw speech data is sparse. Maltese is one such
language. Recent years have seen an increased interest in the computational
processing of Maltese, including speech technologies, but resources for the
latter remain sparse. In this paper, we consider data augmentation techniques
for improving speech recognition for such languages, focusing on Maltese as a
test case. We consider three different types of data augmentation: unsupervised
training, multilingual training and the use of synthesized speech as training
data. The goal is to determine which of these techniques, or combination of
them, is the most effective to improve speech recognition for languages where
the starting point is a small corpus of approximately 7 hours of transcribed
speech. Our results show that combining the three data augmentation techniques
studied here lead us to an absolute WER improvement of 15% without the use of a
language model.
Related papers
- A Cross-Lingual Meta-Learning Method Based on Domain Adaptation for Speech Emotion Recognition [1.8377902806196766]
Best-performing speech models are trained on large amounts of data in the language they are meant to work for.
Most languages have sparse data, making training models challenging.
Our work explores the model's performance in limited data, specifically for speech emotion recognition.
arXiv Detail & Related papers (2024-10-06T21:33:51Z) - CLARA: Multilingual Contrastive Learning for Audio Representation
Acquisition [5.520654376217889]
CLARA minimizes reliance on labelled data, enhancing generalization across languages.
Our approach adeptly captures emotional nuances in speech, overcoming subjective assessment issues.
It adapts to low-resource languages, marking progress in multilingual speech representation learning.
arXiv Detail & Related papers (2023-10-18T09:31:56Z) - Scaling Speech Technology to 1,000+ Languages [66.31120979098483]
The Massively Multilingual Speech (MMS) project increases the number of supported languages by 10-40x, depending on the task.
Main ingredients are a new dataset based on readings of publicly available religious texts.
We built pre-trained wav2vec 2.0 models covering 1,406 languages, a single multilingual automatic speech recognition model for 1,107 languages, speech synthesis models for the same number of languages, and a language identification model for 4,017 languages.
arXiv Detail & Related papers (2023-05-22T22:09:41Z) - Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years.
We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data.
Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z) - Adapting Multilingual Speech Representation Model for a New,
Underresourced Language through Multilingual Fine-tuning and Continued
Pretraining [2.3513645401551333]
We investigate the possibility for adapting an existing multilingual wav2vec 2.0 model for a new language.
Our results show that continued pretraining is the most effective method to adapt a wav2vec 2.0 model for a new language.
We find that if a model pretrained on a related speech variety or an unrelated language with similar phonological characteristics is available, multilingual fine-tuning using additional data from that language can have positive impact on speech recognition performance.
arXiv Detail & Related papers (2023-01-18T03:57:53Z) - Language-agnostic Code-Switching in Sequence-To-Sequence Speech
Recognition [62.997667081978825]
Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages.
We propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are transcribed.
We show that this augmentation can even improve the model's performance on inter-sentential language switches not seen during training by 5,03% WER.
arXiv Detail & Related papers (2022-10-17T12:15:57Z) - Cross-lingual Transfer for Speech Processing using Acoustic Language
Similarity [81.51206991542242]
Cross-lingual transfer offers a compelling way to help bridge this digital divide.
Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-resource languages.
We propose a language similarity approach that can efficiently identify acoustic cross-lingual transfer pairs across hundreds of languages.
arXiv Detail & Related papers (2021-11-02T01:55:17Z) - Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking
Head Generation Using Phonetic Posteriorgrams [58.617181880383605]
In this work, we propose a novel approach using phonetic posteriorgrams.
Our method doesn't need hand-crafted features and is more robust to noise compared to recent approaches.
Our model is the first to support multilingual/mixlingual speech as input with convincing results.
arXiv Detail & Related papers (2020-06-20T16:32:43Z) - Meta-Transfer Learning for Code-Switched Speech Recognition [72.84247387728999]
We propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting.
Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data.
arXiv Detail & Related papers (2020-04-29T14:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.