Language-Universal Adapter Learning with Knowledge Distillation for
End-to-End Multilingual Speech Recognition
- URL: http://arxiv.org/abs/2303.01249v1
- Date: Tue, 28 Feb 2023 14:43:49 GMT
- Title: Language-Universal Adapter Learning with Knowledge Distillation for
End-to-End Multilingual Speech Recognition
- Authors: Zhijie Shen, Wu Guo, Bin Gu
- Abstract summary: We propose a language-universal adapter learning framework based on a pre-trained model for end-to-end multilingual automatic speech recognition.
An online knowledge distillation is then used to enable the language-universal adapters to learn both language-specific and universal features.
Compared to the conventional multilingual model, a 3.3% absolute error rate reduction is achieved.
- Score: 28.416831396722106
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a language-universal adapter learning framework
based on a pre-trained model for end-to-end multilingual automatic speech
recognition (ASR). For acoustic modeling, the wav2vec 2.0 pre-trained model is
fine-tuned by inserting language-specific and language-universal adapters. An
online knowledge distillation is then used to enable the language-universal
adapters to learn both language-specific and universal features. The linguistic
information confusion is also reduced by leveraging language identifiers
(LIDs). With LIDs we perform a position-wise modification on the multi-head
attention outputs. In the inference procedure, the language-specific adapters
are removed while the language-universal adapters are kept activated. The
proposed method improves the recognition accuracy and addresses the linear
increase of the number of adapters' parameters with the number of languages in
common multilingual ASR systems. Experiments on the BABEL dataset confirm the
effectiveness of the proposed framework. Compared to the conventional
multilingual model, a 3.3% absolute error rate reduction is achieved. The code
is available at: https://github.com/shen9712/UniversalAdapterLearning.
Related papers
- The Impact of Language Adapters in Cross-Lingual Transfer for NLU [0.8702432681310401]
We study the effect of including a target-language adapter in detailed ablation studies with two multilingual models and three multilingual datasets.
Our results show that the effect of target-language adapters is highly inconsistent across tasks, languages and models.
Removing the language adapter after training has only a weak negative effect, indicating that the language adapters do not have a strong impact on the predictions.
arXiv Detail & Related papers (2024-01-31T20:07:43Z) - Adapting the adapters for code-switching in multilingual ASR [10.316724084739892]
Large pre-trained multilingual speech models have shown potential in scaling Automatic Speech Recognition to many low-resource languages.
Some of these models employ language adapters in their formulation, which helps to improve monolingual performance.
This formulation restricts the usability of these models on code-switched speech, where two languages are mixed together in the same utterance.
We propose ways to effectively fine-tune such models on code-switched speech, by assimilating information from both language adapters at each language adaptation point in the network.
arXiv Detail & Related papers (2023-10-11T12:15:24Z) - Multilingual Detection of Check-Worthy Claims using World Languages and
Adapter Fusion [12.269362823116225]
Resource scarcity for non-world languages and model learning costs remain major challenges for the creation of models supporting multilingual check-worthiness detection.
This paper proposes cross-training adapters on a subset of world languages, combined by adapter fusion, to detect claims emerging globally in multiple languages.
arXiv Detail & Related papers (2023-01-13T11:50:08Z) - Language-Family Adapters for Low-Resource Multilingual Neural Machine
Translation [129.99918589405675]
Large multilingual models trained with self-supervision achieve state-of-the-art results in a wide range of natural language processing tasks.
Multilingual fine-tuning improves performance on low-resource languages but requires modifying the entire model and can be prohibitively expensive.
We propose training language-family adapters on top of mBART-50 to facilitate cross-lingual transfer.
arXiv Detail & Related papers (2022-09-30T05:02:42Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - Code Switched and Code Mixed Speech Recognition for Indic languages [0.0]
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and lexical information is typically language specific.
We compare the performance of end to end multilingual speech recognition system to the performance of monolingual models conditioned on language identification (LID)
We also propose a similar technique to solve the Code Switched problem and achieve a WER of 21.77 and 28.27 over Hindi-English and Bengali-English respectively.
arXiv Detail & Related papers (2022-03-30T18:09:28Z) - Exploring Teacher-Student Learning Approach for Multi-lingual
Speech-to-Intent Classification [73.5497360800395]
We develop an end-to-end system that supports multiple languages.
We exploit knowledge from a pre-trained multi-lingual natural language processing model.
arXiv Detail & Related papers (2021-09-28T04:43:11Z) - Efficient Test Time Adapter Ensembling for Low-resource Language
Varieties [115.12997212870962]
Specialized language and task adapters have been proposed to facilitate cross-lingual transfer of multilingual pretrained models.
An intuitive solution is to use a related language adapter for the new language variety, but we observe that this solution can lead to sub-optimal performance.
In this paper, we aim to improve the robustness of language adapters to uncovered languages without training new adapters.
arXiv Detail & Related papers (2021-09-10T13:44:46Z) - Exploiting Adapters for Cross-lingual Low-resource Speech Recognition [52.40623653290499]
Cross-lingual speech adaptation aims to solve the problem of leveraging multiple rich-resource languages to build models for a low-resource target language.
We propose adapters to investigate the performance of multiple adapters for parameter-efficient cross-lingual speech adaptation.
arXiv Detail & Related papers (2021-05-18T08:30:37Z) - Adapt-and-Adjust: Overcoming the Long-Tail Problem of Multilingual
Speech Recognition [58.849768879796905]
We propose Adapt-and-Adjust (A2), a transformer-based multi-task learning framework for end-to-end multilingual speech recognition.
The A2 framework overcomes the long-tail problem via three techniques: (1) exploiting a pretrained multilingual language model (mBERT) to improve the performance of low-resource languages; (2) proposing dual adapters consisting of both language-specific and language-agnostic adaptation with minimal additional parameters; and (3) overcoming the class imbalance, either by imposing class priors in the loss during training or adjusting the logits of the softmax output during inference.
arXiv Detail & Related papers (2020-12-03T03:46:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.