Related papers: Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition

Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition

URL: http://arxiv.org/abs/2306.08577v1
Date: Wed, 14 Jun 2023 15:24:31 GMT
Title: Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition
Authors: Muhammad Umar Farooq, Thomas Hain
Abstract summary: Exploiting cross-lingual resources is an effective way to compensate for data scarcity of low resource languages. We extend the concept of learnable cross-lingual mappings for end-to-end speech recognition. The results show that any source language ASR model can be used for a low-resource target language recognition.
Score: 31.575930914290762
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Exploiting cross-lingual resources is an effective way to compensate for data scarcity of low resource languages. Recently, a novel multilingual model fusion technique has been proposed where a model is trained to learn cross-lingual acoustic-phonetic similarities as a mapping function. However, handcrafted lexicons have been used to train hybrid DNN-HMM ASR systems. To remove this dependency, we extend the concept of learnable cross-lingual mappings for end-to-end speech recognition. Furthermore, mapping models are employed to transliterate the source languages to the target language without using parallel data. Finally, the source audio and its transliteration is used for data augmentation to retrain the target language ASR. The results show that any source language ASR model can be used for a low-resource target language recognition followed by proposed mapping model. Furthermore, data augmentation results in a relative gain up to 5% over baseline monolingual model.

Related papers

Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages [0.43498389175652036]
This study integrates traditional and novel language models with fine-tuned Whisper models to raise their performance in less commonly studied languages. We demonstrate substantial improvements in word error rate, particularly in low-resource scenarios. While the integration reliably benefits all model sizes, the extent of improvement varies, highlighting the importance of optimized language model parameters.
arXiv Detail & Related papers (2025-03-30T18:03:52Z)
Reduce, Reuse, Recycle: Is Perturbed Data better than Other Language augmentation for Low Resource Self-Supervised Speech Models [48.44820587495038]
Self-supervised representation learning (SSRL) has demonstrated superior performance than supervised models for tasks including phoneme recognition. Training SSRL models poses a challenge for low-resource languages where sufficient pre-training data may not be available. We propose to use audio augmentation techniques, namely: pitch variation, noise addition, accented target language and other language speech to pre-train SSRL models in a low resource condition and evaluate phoneme recognition.
arXiv Detail & Related papers (2023-09-22T10:09:09Z)
Learning ASR pathways: A sparse multilingual ASR model [31.147484652643282]
We present ASR pathways, a sparse multilingual ASR model that activates language-specific sub-networks ("pathways") With the overlapping sub-networks, the shared parameters can also enable knowledge transfer for lower-resource languages via joint multilingual training. Our proposed ASR pathways outperform both dense models and a language-agnostically pruned model, and provide better performance on low-resource languages.
arXiv Detail & Related papers (2022-09-13T05:14:08Z)
Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion [26.728287476234538]
hybrid DNN-HMM acoustic models fusion is proposed in a multilingual setup for the low-resource languages. Posterior distributions from different monolingual acoustic models against a target language speech signal are fused together. A separate regression neural network is trained for each source-target language pair to transform posteriors from source acoustic model to the target language.
arXiv Detail & Related papers (2022-07-07T15:56:50Z)
Adaptive Activation Network For Low Resource Multilingual Speech Recognition [30.460501537763736]
We introduce an adaptive activation network to the upper layers of ASR model. We also proposed two approaches to train the model: (1) cross-lingual learning, replacing the activation function from source language to target language, and (2) multilingual learning. Our experiments on IARPA Babel datasets demonstrated that our approaches outperform the from-scratch training and traditional bottleneck feature based methods.
arXiv Detail & Related papers (2022-05-28T04:02:59Z)
Reinforced Iterative Knowledge Distillation for Cross-Lingual Named Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources. Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages. We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z)
Bootstrap an end-to-end ASR system by multilingual training, transfer learning, text-to-text mapping and synthetic audio [8.510792628268824]
bootstrapping speech recognition on limited data resources has been an area of active research for long. We investigate here the effectiveness of different strategies to bootstrap an RNN-Transducer based automatic speech recognition (ASR) system in the low resource regime. Our experiments demonstrate that transfer learning from a multilingual model, using a post-ASR text-to-text mapping and synthetic audio deliver additive improvements.
arXiv Detail & Related papers (2020-11-25T13:11:32Z)
Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages. We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC) LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language. We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z)
Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation [63.16500026845157]
We introduce speech-to-text translation as an auxiliary task to incorporate additional knowledge of the target language. We show that training ST with human translations is not necessary. Even with pseudo-labels from low-resource MT (200K examples), ST-enhanced transfer brings up to 8.9% WER reduction to direct transfer.
arXiv Detail & Related papers (2020-06-09T19:34:11Z)
Meta-Transfer Learning for Code-Switched Speech Recognition [72.84247387728999]
We propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting. Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data.
arXiv Detail & Related papers (2020-04-29T14:27:19Z)
Rnn-transducer with language bias for end-to-end Mandarin-English code-switching speech recognition [58.105818353866354]
We propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem. We use the language identities to bias the model to predict the CS points. This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed.
arXiv Detail & Related papers (2020-02-19T12:01:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.