Adapt-and-Adjust: Overcoming the Long-Tail Problem of Multilingual
Speech Recognition
- URL: http://arxiv.org/abs/2012.01687v1
- Date: Thu, 3 Dec 2020 03:46:16 GMT
- Title: Adapt-and-Adjust: Overcoming the Long-Tail Problem of Multilingual
Speech Recognition
- Authors: Genta Indra Winata, Guangsen Wang, Caiming Xiong, Steven Hoi
- Abstract summary: We propose Adapt-and-Adjust (A2), a transformer-based multi-task learning framework for end-to-end multilingual speech recognition.
The A2 framework overcomes the long-tail problem via three techniques: (1) exploiting a pretrained multilingual language model (mBERT) to improve the performance of low-resource languages; (2) proposing dual adapters consisting of both language-specific and language-agnostic adaptation with minimal additional parameters; and (3) overcoming the class imbalance, either by imposing class priors in the loss during training or adjusting the logits of the softmax output during inference.
- Score: 58.849768879796905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One crucial challenge of real-world multilingual speech recognition is the
long-tailed distribution problem, where some resource-rich languages like
English have abundant training data, but a long tail of low-resource languages
have varying amounts of limited training data. To overcome the long-tail
problem, in this paper, we propose Adapt-and-Adjust (A2), a transformer-based
multi-task learning framework for end-to-end multilingual speech recognition.
The A2 framework overcomes the long-tail problem via three techniques: (1)
exploiting a pretrained multilingual language model (mBERT) to improve the
performance of low-resource languages; (2) proposing dual adapters consisting
of both language-specific and language-agnostic adaptation with minimal
additional parameters; and (3) overcoming the class imbalance, either by
imposing class priors in the loss during training or adjusting the logits of
the softmax output during inference. Extensive experiments on the CommonVoice
corpus show that A2 significantly outperforms conventional approaches.
Related papers
- Optimizing Low-Resource Language Model Training: Comprehensive Analysis of Multi-Epoch, Multi-Lingual, and Two-Stage Approaches [3.809045695573932]
Existing works adopt multi-epoch, multi-lingual, and two-stage training to utilize the limited target language corpus efficiently.
We exhaustively explore training setups for low-resource language LLM, combining these three approaches.
As the amount of target language corpus decreases, the optimal training approach shifts from monolingual single-stage training to multi-lingual two-stage training at a compute budget dependent threshold.
arXiv Detail & Related papers (2024-10-16T07:45:56Z) - LlamaTurk: Adapting Open-Source Generative Large Language Models for Low-Resource Language [2.9914612342004503]
This study explores an alternative solution by adapting large language models, primarily trained on English, to low-resource languages.
We assess various strategies, including continual training, instruction fine-tuning, task-specific fine-tuning, and vocabulary extension.
The results show that continual training improves language comprehension, as reflected in perplexity scores, and task-specific tuning generally enhances performance of downstream tasks.
arXiv Detail & Related papers (2024-05-13T13:41:59Z) - No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement [59.37775534633868]
We introduce a novel method called language arithmetic, which enables training-free post-processing.
The effectiveness of the proposed solution is demonstrated on three downstream tasks in a MAD-X-based set of cross-lingual schemes.
arXiv Detail & Related papers (2024-04-24T08:52:40Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Adapting Multilingual Speech Representation Model for a New,
Underresourced Language through Multilingual Fine-tuning and Continued
Pretraining [2.3513645401551333]
We investigate the possibility for adapting an existing multilingual wav2vec 2.0 model for a new language.
Our results show that continued pretraining is the most effective method to adapt a wav2vec 2.0 model for a new language.
We find that if a model pretrained on a related speech variety or an unrelated language with similar phonological characteristics is available, multilingual fine-tuning using additional data from that language can have positive impact on speech recognition performance.
arXiv Detail & Related papers (2023-01-18T03:57:53Z) - Language-Family Adapters for Low-Resource Multilingual Neural Machine
Translation [129.99918589405675]
Large multilingual models trained with self-supervision achieve state-of-the-art results in a wide range of natural language processing tasks.
Multilingual fine-tuning improves performance on low-resource languages but requires modifying the entire model and can be prohibitively expensive.
We propose training language-family adapters on top of mBART-50 to facilitate cross-lingual transfer.
arXiv Detail & Related papers (2022-09-30T05:02:42Z) - Improving the Cross-Lingual Generalisation in Visual Question Answering [40.86774711775718]
multilingual vision-language pretrained models show poor cross-lingual generalisation when applied to non-English data.
In this work, we explore the poor performance of these models on a zero-shot cross-lingual visual question answering (VQA) task.
We improve cross-lingual transfer with three strategies: (1) we introduce a linguistic prior objective to augment the cross-entropy loss with a similarity-based loss to guide the model during training, (2) we learn a task-specific subnetwork that improves cross-lingual generalisation and reduces variance without model modification, and (3) we augment training examples using synthetic code
arXiv Detail & Related papers (2022-09-07T08:07:43Z) - High-resource Language-specific Training for Multilingual Neural Machine
Translation [109.31892935605192]
We propose the multilingual translation model with the high-resource language-specific training (HLT-MT) to alleviate the negative interference.
Specifically, we first train the multilingual model only with the high-resource pairs and select the language-specific modules at the top of the decoder.
HLT-MT is further trained on all available corpora to transfer knowledge from high-resource languages to low-resource languages.
arXiv Detail & Related papers (2022-07-11T14:33:13Z) - Adaptive Activation Network For Low Resource Multilingual Speech
Recognition [30.460501537763736]
We introduce an adaptive activation network to the upper layers of ASR model.
We also proposed two approaches to train the model: (1) cross-lingual learning, replacing the activation function from source language to target language, and (2) multilingual learning.
Our experiments on IARPA Babel datasets demonstrated that our approaches outperform the from-scratch training and traditional bottleneck feature based methods.
arXiv Detail & Related papers (2022-05-28T04:02:59Z) - Enhancing Answer Boundary Detection for Multilingual Machine Reading
Comprehension [86.1617182312817]
We propose two auxiliary tasks in the fine-tuning stage to create additional phrase boundary supervision.
A mixed Machine Reading task, which translates the question or passage to other languages and builds cross-lingual question-passage pairs.
A language-agnostic knowledge masking task by leveraging knowledge phrases mined from web.
arXiv Detail & Related papers (2020-04-29T10:44:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.