Efficient Adapter Finetuning for Tail Languages in Streaming
Multilingual ASR
- URL: http://arxiv.org/abs/2401.08992v1
- Date: Wed, 17 Jan 2024 06:01:16 GMT
- Title: Efficient Adapter Finetuning for Tail Languages in Streaming
Multilingual ASR
- Authors: Junwen Bai, Bo Li, Qiujia Li, Tara N. Sainath, Trevor Strohman
- Abstract summary: The heterogeneous nature and imbalanced data abundance of different languages may cause performance degradation.
Our proposed method brings 12.2% word error rate reduction on average and up to 37.5% on a single locale.
- Score: 44.949146169903074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The end-to-end ASR model is often desired in the streaming multilingual
scenario since it is easier to deploy and can benefit from pre-trained speech
models such as powerful foundation models. Meanwhile, the heterogeneous nature
and imbalanced data abundance of different languages may cause performance
degradation, leading to asynchronous peak performance for different languages
during training, especially on tail ones. Sometimes even the data itself may
become unavailable as a result of the enhanced privacy protection. Existing
work tend to significantly increase the model size or learn language-specific
decoders to accommodate each language separately. In this study, we explore
simple yet effective Language-Dependent Adapter (LDA) finetuning under a
cascaded Conformer transducer framework enhanced by teacher pseudo-labeling for
tail languages in the streaming multilingual ASR. The adapter only accounts for
0.4% of the full model per language. It is plugged into the frozen foundation
model and is the only trainable module during the finetuning process with noisy
student training. The final model merges the adapter parameters from different
checkpoints for different languages. The model performance is validated on a
challenging multilingual dictation dataset, which includes 39 tail languages
across Latin, Greek, Arabic, etc. Our proposed method brings 12.2% word error
rate reduction on average and up to 37.5% on a single locale. Furthermore, we
show that our parameter-efficient LDA can match the quality of the full model
finetuning, thus greatly alleviating the asynchronous peak performance issue.
Related papers
- On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based
Multilingual Model [49.81429697921861]
We study the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models.
We show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning.
arXiv Detail & Related papers (2023-11-14T00:43:33Z) - One Adapter for All Programming Languages? Adapter Tuning for Code
Search and Summarization [27.27985393610581]
We find that multilingual fine-tuning leads to performance degradation on recent models UniXcoder and CodeT5.
To alleviate the potentially catastrophic forgetting issue in multilingual models, we fix all pre-trained model parameters, insert the parameter-efficient structure adapter, and fine-tune it.
Our experiments on three probing tasks show that adapter tuning significantly outperforms full-model fine-tuning and effectively overcomes catastrophic forgetting.
arXiv Detail & Related papers (2023-03-28T08:49:54Z) - Improving Massively Multilingual ASR With Auxiliary CTC Objectives [40.10307386370194]
We introduce our work on improving performance on FLEURS, a 102-language open ASR benchmark.
We investigate techniques inspired from recent Connectionist Temporal Classification ( CTC) studies to help the model handle the large number of languages.
Our state-of-the-art systems using self-supervised models with the Conformer architecture improve over the results of prior work on FLEURS by a relative 28.4% CER.
arXiv Detail & Related papers (2023-02-24T18:59:51Z) - Parameter-efficient Zero-shot Transfer for Cross-Language Dense
Retrieval with Adapters [20.168480824057923]
A popular approach to creating a cross-language retrieval model is to substitute a monolingual pretrained language model in the retrieval model.
We show that models trained with monolingual data are more effective than fine-tuning the entire model when transferring to a Cross Language Information Retrieval setting.
arXiv Detail & Related papers (2022-12-20T17:25:04Z) - Language-Family Adapters for Low-Resource Multilingual Neural Machine
Translation [129.99918589405675]
Large multilingual models trained with self-supervision achieve state-of-the-art results in a wide range of natural language processing tasks.
Multilingual fine-tuning improves performance on low-resource languages but requires modifying the entire model and can be prohibitively expensive.
We propose training language-family adapters on top of mBART-50 to facilitate cross-lingual transfer.
arXiv Detail & Related papers (2022-09-30T05:02:42Z) - OneAligner: Zero-shot Cross-lingual Transfer with One Rich-Resource
Language Pair for Low-Resource Sentence Retrieval [91.76575626229824]
We present OneAligner, an alignment model specially designed for sentence retrieval tasks.
When trained with all language pairs of a large-scale parallel multilingual corpus (OPUS-100), this model achieves the state-of-the-art result.
We conclude through empirical results and analyses that the performance of the sentence alignment task depends mostly on the monolingual and parallel data size.
arXiv Detail & Related papers (2022-05-17T19:52:42Z) - Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual
Retrieval [66.69799641522133]
State-of-the-art neural (re)rankers are notoriously data hungry.
Current approaches typically transfer rankers trained on English data to other languages and cross-lingual setups by means of multilingual encoders.
We show that two parameter-efficient approaches to cross-lingual transfer, namely Sparse Fine-Tuning Masks (SFTMs) and Adapters, allow for a more lightweight and more effective zero-shot transfer.
arXiv Detail & Related papers (2022-04-05T15:44:27Z) - Continual Learning in Multilingual NMT via Language-Specific Embeddings [92.91823064720232]
It consists in replacing the shared vocabulary with a small language-specific vocabulary and fine-tuning the new embeddings on the new language's parallel data.
Because the parameters of the original model are not modified, its performance on the initial languages does not degrade.
arXiv Detail & Related papers (2021-10-20T10:38:57Z) - Lightweight Adapter Tuning for Multilingual Speech Translation [47.89784337058167]
Adapter modules were recently introduced as an efficient alternative to fine-tuning in NLP.
This paper proposes a comprehensive analysis of adapters for multilingual speech translation.
arXiv Detail & Related papers (2021-06-02T20:51:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.