Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities
on Multilingual Speech Recognition
- URL: http://arxiv.org/abs/2207.03390v1
- Date: Thu, 7 Jul 2022 15:55:41 GMT
- Title: Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities
on Multilingual Speech Recognition
- Authors: Muhammad Umar Farooq, Thomas Hain
- Abstract summary: A novel data-driven approach is proposed to investigate the cross-lingual acoustic-phonetic similarities.
Deep neural networks are trained as mapping networks to transform the distributions from different acoustic models into a directly comparable form.
A relative improvement of 8% over monolingual counterpart is achieved.
- Score: 31.575930914290762
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multilingual automatic speech recognition (ASR) systems mostly benefit low
resource languages but suffer degradation in performance across several
languages relative to their monolingual counterparts. Limited studies have
focused on understanding the languages behaviour in the multilingual speech
recognition setups. In this paper, a novel data-driven approach is proposed to
investigate the cross-lingual acoustic-phonetic similarities. This technique
measures the similarities between posterior distributions from various
monolingual acoustic models against a target speech signal. Deep neural
networks are trained as mapping networks to transform the distributions from
different acoustic models into a directly comparable form. The analysis
observes that the languages closeness can not be truly estimated by the volume
of overlapping phonemes set. Entropy analysis of the proposed mapping networks
exhibits that a language with lesser overlap can be more amenable to
cross-lingual transfer, and hence more beneficial in the multilingual setup.
Finally, the proposed posterior transformation approach is leveraged to fuse
monolingual models for a target language. A relative improvement of ~8% over
monolingual counterpart is achieved.
Related papers
- An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios [76.11409260727459]
This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system.
We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance.
arXiv Detail & Related papers (2024-06-13T08:16:52Z) - Improving In-context Learning of Multilingual Generative Language Models with Cross-lingual Alignment [42.624862172666624]
We propose a simple yet effective cross-lingual alignment framework exploiting pairs of translation sentences.
It aligns the internal sentence representations across different languages via multilingual contrastive learning.
Experimental results show that even with less than 0.1 textperthousand of pre-training tokens, our alignment framework significantly boosts the cross-lingual abilities of generative language models.
arXiv Detail & Related papers (2023-11-14T11:24:08Z) - Towards a Deep Understanding of Multilingual End-to-End Speech
Translation [52.26739715012842]
We analyze representations learnt in a multilingual end-to-end speech translation model trained over 22 languages.
We derive three major findings from our analysis.
arXiv Detail & Related papers (2023-10-31T13:50:55Z) - High-resource Language-specific Training for Multilingual Neural Machine
Translation [109.31892935605192]
We propose the multilingual translation model with the high-resource language-specific training (HLT-MT) to alleviate the negative interference.
Specifically, we first train the multilingual model only with the high-resource pairs and select the language-specific modules at the top of the decoder.
HLT-MT is further trained on all available corpora to transfer knowledge from high-resource languages to low-resource languages.
arXiv Detail & Related papers (2022-07-11T14:33:13Z) - Non-Linear Pairwise Language Mappings for Low-Resource Multilingual
Acoustic Model Fusion [26.728287476234538]
hybrid DNN-HMM acoustic models fusion is proposed in a multilingual setup for the low-resource languages.
Posterior distributions from different monolingual acoustic models against a target language speech signal are fused together.
A separate regression neural network is trained for each source-target language pair to transform posteriors from source acoustic model to the target language.
arXiv Detail & Related papers (2022-07-07T15:56:50Z) - Cross-lingual Low Resource Speaker Adaptation Using Phonological
Features [2.8080708404213373]
We train a language-agnostic multispeaker model conditioned on a set of phonologically derived features common across different languages.
With as few as 32 and 8 utterances of target speaker data, we obtain high speaker similarity scores and naturalness comparable to the corresponding literature.
arXiv Detail & Related papers (2021-11-17T12:33:42Z) - Cross-lingual Transfer for Speech Processing using Acoustic Language
Similarity [81.51206991542242]
Cross-lingual transfer offers a compelling way to help bridge this digital divide.
Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-resource languages.
We propose a language similarity approach that can efficiently identify acoustic cross-lingual transfer pairs across hundreds of languages.
arXiv Detail & Related papers (2021-11-02T01:55:17Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Unsupervised Cross-lingual Representation Learning for Speech
Recognition [63.85924123692923]
XLSR learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.
We build on wav2vec 2.0 which is trained by solving a contrastive task over masked latent speech representations.
Experiments show that cross-lingual pretraining significantly outperforms monolingual pretraining.
arXiv Detail & Related papers (2020-06-24T18:25:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.