Improved Self-Supervised Multilingual Speech Representation Learning
Combined with Auxiliary Language Information
- URL: http://arxiv.org/abs/2212.03476v1
- Date: Wed, 7 Dec 2022 06:18:59 GMT
- Title: Improved Self-Supervised Multilingual Speech Representation Learning
Combined with Auxiliary Language Information
- Authors: Fenglin Ding, Genshun Wan, Pengcheng Li, Jia Pan, Cong Liu
- Abstract summary: Self-supervised multilingual speech representation learning has shown success in improving the performance of multilingual automatic speech recognition.
However, similar to the supervised learning, multilingual pre-training may also suffer from language interference.
We introduce several techniques for improving self-supervised multilingual pre-training by leveraging auxiliary language information.
- Score: 21.250763472985824
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multilingual end-to-end models have shown great improvement over monolingual
systems. With the development of pre-training methods on speech,
self-supervised multilingual speech representation learning like XLSR has shown
success in improving the performance of multilingual automatic speech
recognition (ASR). However, similar to the supervised learning, multilingual
pre-training may also suffer from language interference and further affect the
application of multilingual system. In this paper, we introduce several
techniques for improving self-supervised multilingual pre-training by
leveraging auxiliary language information, including the language adversarial
training, language embedding and language adaptive training during the
pre-training stage. We conduct experiments on a multilingual ASR task
consisting of 16 languages. Our experimental results demonstrate 14.3% relative
gain over the standard XLSR model, and 19.8% relative gain over the no
pre-training multilingual model.
Related papers
- Multilingual Pretraining and Instruction Tuning Improve Cross-Lingual Knowledge Alignment, But Only Shallowly [53.04368883943773]
Two approaches are proposed to address this, i.e., multilingual pretraining and multilingual instruction tuning.
We propose CLiKA to assess the cross-lingual knowledge alignment of LLMs in the Performance, Consistency and Conductivity levels.
Results show that while both multilingual pretraining and instruction tuning are beneficial for cross-lingual knowledge alignment, the training strategy needs to be carefully designed.
arXiv Detail & Related papers (2024-04-06T15:25:06Z) - Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech
Pre-Training for Adaptation to Unseen Languages [40.41642013737395]
Recent models such as XLS-R and Whisper have made multilingual speech technologies more accessible by pre-training on audio from around 100 spoken languages each.
We aim to understand which model adapts better to languages unseen during pre-training.
We fine-tune both models on 13 unseen languages and 18 seen languages.
arXiv Detail & Related papers (2023-05-21T23:53:12Z) - Building High-accuracy Multilingual ASR with Gated Language Experts and
Curriculum Training [45.48362355283723]
We propose gated language experts and curriculum training to enhance multilingual transformer transducer models.
Our method incorporates a gating mechanism and LID loss, enabling transformer experts to learn language-specific information.
arXiv Detail & Related papers (2023-03-01T19:20:01Z) - High-resource Language-specific Training for Multilingual Neural Machine
Translation [109.31892935605192]
We propose the multilingual translation model with the high-resource language-specific training (HLT-MT) to alleviate the negative interference.
Specifically, we first train the multilingual model only with the high-resource pairs and select the language-specific modules at the top of the decoder.
HLT-MT is further trained on all available corpora to transfer knowledge from high-resource languages to low-resource languages.
arXiv Detail & Related papers (2022-07-11T14:33:13Z) - Exploring Teacher-Student Learning Approach for Multi-lingual
Speech-to-Intent Classification [73.5497360800395]
We develop an end-to-end system that supports multiple languages.
We exploit knowledge from a pre-trained multi-lingual natural language processing model.
arXiv Detail & Related papers (2021-09-28T04:43:11Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - CLSRIL-23: Cross Lingual Speech Representations for Indic Languages [0.0]
CLSRIL-23 is a self supervised learning based model which learns cross lingual speech representations from raw audio across 23 Indic languages.
It is built on top of wav2vec 2.0 which is solved by training a contrastive task over masked latent speech representations.
We compare the language wise loss during pretraining to compare effects of monolingual and multilingual pretraining.
arXiv Detail & Related papers (2021-07-15T15:42:43Z) - Improved Language Identification Through Cross-Lingual Self-Supervised
Learning [37.32193095549614]
We extend previous self-supervised work on language identification by experimenting with pre-trained models.
Results on a 25 languages setup show that with only 10 minutes of labeled data per language, a cross-lingually pre-trained model can achieve over 93% accuracy.
arXiv Detail & Related papers (2021-07-08T19:37:06Z) - XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation [93.80733419450225]
This paper analyzes the current state of cross-lingual transfer learning.
We extend XTREME to XTREME-R, which consists of an improved set of ten natural language understanding tasks.
arXiv Detail & Related papers (2021-04-15T12:26:12Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.