Language-Aware Multilingual Machine Translation with Self-Supervised
Learning
- URL: http://arxiv.org/abs/2302.05008v1
- Date: Fri, 10 Feb 2023 01:34:24 GMT
- Title: Language-Aware Multilingual Machine Translation with Self-Supervised
Learning
- Authors: Haoran Xu, Jean Maillard, Vedanuj Goswami
- Abstract summary: Multilingual machine translation (MMT) benefits from cross-lingual transfer but is a challenging multitask optimization problem.
Self-supervised learning approaches have shown promise by improving translation performance as complementary tasks to the MMT task.
We propose a novel but simple SSL task, concurrent denoising, that co-trains with the MMT task by concurrently denoising monolingual data on both the encoder and decoder.
- Score: 13.250011906361273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual machine translation (MMT) benefits from cross-lingual transfer
but is a challenging multitask optimization problem. This is partly because
there is no clear framework to systematically learn language-specific
parameters. Self-supervised learning (SSL) approaches that leverage large
quantities of monolingual data (where parallel data is unavailable) have shown
promise by improving translation performance as complementary tasks to the MMT
task. However, jointly optimizing SSL and MMT tasks is even more challenging.
In this work, we first investigate how to utilize intra-distillation to learn
more *language-specific* parameters and then show the importance of these
language-specific parameters. Next, we propose a novel but simple SSL task,
concurrent denoising, that co-trains with the MMT task by concurrently
denoising monolingual data on both the encoder and decoder. Finally, we apply
intra-distillation to this co-training approach. Combining these two approaches
significantly improves MMT performance, outperforming three state-of-the-art
SSL methods by a large margin, e.g., 11.3\% and 3.7\% improvement on an
8-language and a 15-language benchmark compared with MASS, respectively
Related papers
- Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation [25.850573463743352]
Large-scale multilingual Pretrained Language Models (mPLMs) yield impressive performance on cross-language tasks.
Yet significant performance disparities exist across different languages within the same mPLM.
We introduce ALSACE to leverage the learned knowledge from the well-performing languages to guide under-performing ones within the same mPLM.
arXiv Detail & Related papers (2024-04-12T14:19:16Z) - Extrapolating Large Language Models to Non-English by Aligning Languages [109.09051737966178]
Existing large language models show disparate capability across different languages.
In this paper, we empower pre-trained LLMs on non-English languages by building semantic alignment across languages.
arXiv Detail & Related papers (2023-08-09T13:32:06Z) - LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine
Translation [94.33019040320507]
Multimodal Machine Translation (MMT) focuses on enhancing text-only translation with visual features.
Recent advances still struggle to train a separate model for each language pair, which is costly and unaffordable when the number of languages increases.
We propose the Multilingual MMT task by establishing two new Multilingual MMT benchmark datasets covering seven languages.
arXiv Detail & Related papers (2022-10-19T12:21:39Z) - Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual
Retrieval [66.69799641522133]
State-of-the-art neural (re)rankers are notoriously data hungry.
Current approaches typically transfer rankers trained on English data to other languages and cross-lingual setups by means of multilingual encoders.
We show that two parameter-efficient approaches to cross-lingual transfer, namely Sparse Fine-Tuning Masks (SFTMs) and Adapters, allow for a more lightweight and more effective zero-shot transfer.
arXiv Detail & Related papers (2022-04-05T15:44:27Z) - Zero-shot Cross-lingual Transfer of Neural Machine Translation with
Multilingual Pretrained Encoders [74.89326277221072]
How to improve the cross-lingual transfer of NMT model with multilingual pretrained encoder is under-explored.
We propose SixT, a simple yet effective model for this task.
Our model achieves better performance on many-to-English testsets than CRISS and m2m-100.
arXiv Detail & Related papers (2021-04-18T07:42:45Z) - Enabling Zero-shot Multilingual Spoken Language Translation with
Language-Specific Encoders and Decoders [5.050654565113709]
Current end-to-end approaches to Spoken Language Translation rely on limited training resources.
Our proposed method extends a MultiNMT architecture based on language-specific encoders-decoders to the task of Multilingual SLT.
arXiv Detail & Related papers (2020-11-02T16:31:14Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z) - Multi-task Learning for Multilingual Neural Machine Translation [32.81785430242313]
We propose a multi-task learning framework that jointly trains the model with the translation task on bitext data and two denoising tasks on the monolingual data.
We show that the proposed approach can effectively improve the translation quality for both high-resource and low-resource languages.
arXiv Detail & Related papers (2020-10-06T06:54:12Z) - Balancing Training for Multilingual Neural Machine Translation [130.54253367251738]
multilingual machine translation (MT) models can translate to/from multiple languages.
Standard practice is to up-sample less resourced languages to increase representation.
We propose a method that instead automatically learns how to weight training data through a data scorer.
arXiv Detail & Related papers (2020-04-14T18:23:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.