Non-Linear Pairwise Language Mappings for Low-Resource Multilingual
Acoustic Model Fusion
- URL: http://arxiv.org/abs/2207.03391v1
- Date: Thu, 7 Jul 2022 15:56:50 GMT
- Title: Non-Linear Pairwise Language Mappings for Low-Resource Multilingual
Acoustic Model Fusion
- Authors: Muhammad Umar Farooq, Darshan Adiga Haniya Narayana, Thomas Hain
- Abstract summary: hybrid DNN-HMM acoustic models fusion is proposed in a multilingual setup for the low-resource languages.
Posterior distributions from different monolingual acoustic models against a target language speech signal are fused together.
A separate regression neural network is trained for each source-target language pair to transform posteriors from source acoustic model to the target language.
- Score: 26.728287476234538
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multilingual speech recognition has drawn significant attention as an
effective way to compensate data scarcity for low-resource languages.
End-to-end (e2e) modelling is preferred over conventional hybrid systems,
mainly because of no lexicon requirement. However, hybrid DNN-HMMs still
outperform e2e models in limited data scenarios. Furthermore, the problem of
manual lexicon creation has been alleviated by publicly available trained
models of grapheme-to-phoneme (G2P) and text to IPA transliteration for a lot
of languages. In this paper, a novel approach of hybrid DNN-HMM acoustic models
fusion is proposed in a multilingual setup for the low-resource languages.
Posterior distributions from different monolingual acoustic models, against a
target language speech signal, are fused together. A separate regression neural
network is trained for each source-target language pair to transform posteriors
from source acoustic model to the target language. These networks require very
limited data as compared to the ASR training. Posterior fusion yields a
relative gain of 14.65% and 6.5% when compared with multilingual and
monolingual baselines respectively. Cross-lingual model fusion shows that the
comparable results can be achieved without using posteriors from the language
dependent ASR.
Related papers
- Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT)
We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training.
Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z) - Learning Cross-lingual Mappings for Data Augmentation to Improve
Low-Resource Speech Recognition [31.575930914290762]
Exploiting cross-lingual resources is an effective way to compensate for data scarcity of low resource languages.
We extend the concept of learnable cross-lingual mappings for end-to-end speech recognition.
The results show that any source language ASR model can be used for a low-resource target language recognition.
arXiv Detail & Related papers (2023-06-14T15:24:31Z) - Exploiting Multilingualism in Low-resource Neural Machine Translation
via Adversarial Learning [3.2258463207097017]
Generative Adversarial Networks (GAN) offer a promising approach for Neural Machine Translation (NMT)
In GAN, similar to bilingual models, multilingual NMT only considers one reference translation for each sentence during model training.
This article proposes Denoising Adversarial Auto-encoder-based Sentence Interpolation (DAASI) approach to perform sentence computation.
arXiv Detail & Related papers (2023-03-31T12:34:14Z) - High-resource Language-specific Training for Multilingual Neural Machine
Translation [109.31892935605192]
We propose the multilingual translation model with the high-resource language-specific training (HLT-MT) to alleviate the negative interference.
Specifically, we first train the multilingual model only with the high-resource pairs and select the language-specific modules at the top of the decoder.
HLT-MT is further trained on all available corpora to transfer knowledge from high-resource languages to low-resource languages.
arXiv Detail & Related papers (2022-07-11T14:33:13Z) - Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities
on Multilingual Speech Recognition [31.575930914290762]
A novel data-driven approach is proposed to investigate the cross-lingual acoustic-phonetic similarities.
Deep neural networks are trained as mapping networks to transform the distributions from different acoustic models into a directly comparable form.
A relative improvement of 8% over monolingual counterpart is achieved.
arXiv Detail & Related papers (2022-07-07T15:55:41Z) - Distilling a Pretrained Language Model to a Multilingual ASR Model [3.4012007729454816]
We distill the rich knowledge embedded inside a well-trained teacher text model to the student speech model.
We show the superiority of our method on 20 low-resource languages of the CommonVoice dataset with less than 100 hours of speech data.
arXiv Detail & Related papers (2022-06-25T12:36:11Z) - Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction.
It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition.
We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z) - Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language
Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements.
We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations.
Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z) - How Phonotactics Affect Multilingual and Zero-shot ASR Performance [74.70048598292583]
A Transformer encoder-decoder model has been shown to leverage multilingual data well in IPA transcriptions of languages presented during training.
We replace the encoder-decoder with a hybrid ASR system consisting of a separate AM and LM.
We show that the gain from modeling crosslingual phonotactics is limited, and imposing a too strong model can hurt the zero-shot transfer.
arXiv Detail & Related papers (2020-10-22T23:07:24Z) - Efficient neural speech synthesis for low-resource languages through
multilingual modeling [3.996275177789896]
Multi-speaker modeling can reduce the data requirements necessary for a new voice.
We show that multilingual models can produce speech with a naturalness comparable to monolingual multi-speaker models.
arXiv Detail & Related papers (2020-08-20T14:05:28Z) - Unsupervised Cross-lingual Representation Learning for Speech
Recognition [63.85924123692923]
XLSR learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.
We build on wav2vec 2.0 which is trained by solving a contrastive task over masked latent speech representations.
Experiments show that cross-lingual pretraining significantly outperforms monolingual pretraining.
arXiv Detail & Related papers (2020-06-24T18:25:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.