Serial or Parallel? Plug-able Adapter for multilingual machine
translation
- URL: http://arxiv.org/abs/2104.08154v1
- Date: Fri, 16 Apr 2021 14:58:28 GMT
- Title: Serial or Parallel? Plug-able Adapter for multilingual machine
translation
- Authors: Yaoming Zhu, Jiangtao Feng, Chengqi Zhao, Mingxuan Wang, Lei Li
- Abstract summary: We propose PAM, a Transformer model augmented with defusion adaptation for multilingual machine translation.
PAM consists of embedding and layer adapters to shift the word and intermediate representations towards language-specific ones.
Experiment results on IWSLT, OPUS-100, and WMT benchmarks show that method outperforms several strong competitors.
- Score: 15.114588783601466
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Developing a unified multilingual translation model is a key topic in machine
translation research. However, existing approaches suffer from performance
degradation: multilingual models yield inferior performance compared to the
ones trained separately on rich bilingual data. We attribute the performance
degradation to two issues: multilingual embedding conflation and multilingual
fusion effects. To address the two issues, we propose PAM, a Transformer model
augmented with defusion adaptation for multilingual machine translation.
Specifically, PAM consists of embedding and layer adapters to shift the word
and intermediate representations towards language-specific ones. Extensive
experiment results on IWSLT, OPUS-100, and WMT benchmarks show that \method
outperforms several strong competitors, including series adapter and
multilingual knowledge distillation.
Related papers
- PEACH: Pre-Training Sequence-to-Sequence Multilingual Models for
Translation with Semi-Supervised Pseudo-Parallel Document Generation [5.004814662623874]
This paper introduces a novel semi-supervised method, SPDG, that generates high-quality pseudo-parallel data for multilingual pre-training.
Our experiments show that PEACH outperforms existing approaches used in training mT5 and mBART on various translation tasks.
arXiv Detail & Related papers (2023-04-03T18:19:26Z) - Language-Family Adapters for Low-Resource Multilingual Neural Machine
Translation [129.99918589405675]
Large multilingual models trained with self-supervision achieve state-of-the-art results in a wide range of natural language processing tasks.
Multilingual fine-tuning improves performance on low-resource languages but requires modifying the entire model and can be prohibitively expensive.
We propose training language-family adapters on top of mBART-50 to facilitate cross-lingual transfer.
arXiv Detail & Related papers (2022-09-30T05:02:42Z) - xGQA: Cross-Lingual Visual Question Answering [100.35229218735938]
xGQA is a new multilingual evaluation benchmark for the visual question answering task.
We extend the established English GQA dataset to 7 typologically diverse languages.
We propose new adapter-based approaches to adapt multimodal transformer-based models to become multilingual.
arXiv Detail & Related papers (2021-09-13T15:58:21Z) - Efficient Test Time Adapter Ensembling for Low-resource Language
Varieties [115.12997212870962]
Specialized language and task adapters have been proposed to facilitate cross-lingual transfer of multilingual pretrained models.
An intuitive solution is to use a related language adapter for the new language variety, but we observe that this solution can lead to sub-optimal performance.
In this paper, we aim to improve the robustness of language adapters to uncovered languages without training new adapters.
arXiv Detail & Related papers (2021-09-10T13:44:46Z) - Lightweight Adapter Tuning for Multilingual Speech Translation [47.89784337058167]
Adapter modules were recently introduced as an efficient alternative to fine-tuning in NLP.
This paper proposes a comprehensive analysis of adapters for multilingual speech translation.
arXiv Detail & Related papers (2021-06-02T20:51:42Z) - Are Multilingual Models Effective in Code-Switching? [57.78477547424949]
We study the effectiveness of multilingual language models to understand their capability and adaptability to the mixed-language setting.
Our findings suggest that pre-trained multilingual models do not necessarily guarantee high-quality representations on code-switching.
arXiv Detail & Related papers (2021-03-24T16:20:02Z) - XLM-T: Scaling up Multilingual Machine Translation with Pretrained
Cross-lingual Transformer Encoders [89.0059978016914]
We present XLM-T, which initializes the model with an off-the-shelf pretrained cross-lingual Transformer and fine-tunes it with multilingual parallel data.
This simple method achieves significant improvements on a WMT dataset with 10 language pairs and the OPUS-100 corpus with 94 pairs.
arXiv Detail & Related papers (2020-12-31T11:16:51Z) - Deep Transformers with Latent Depth [42.33955275626127]
The Transformer model has achieved state-of-the-art performance in many sequence modeling tasks.
We present a probabilistic framework to automatically learn which layer(s) to use by learning the posterior distributions of layer selection.
We propose a novel method to train one shared Transformer network for multilingual machine translation.
arXiv Detail & Related papers (2020-09-28T07:13:23Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.