Mitigating Data Imbalance and Representation Degeneration in
Multilingual Machine Translation
- URL: http://arxiv.org/abs/2305.12786v2
- Date: Tue, 24 Oct 2023 20:10:04 GMT
- Title: Mitigating Data Imbalance and Representation Degeneration in
Multilingual Machine Translation
- Authors: Wen Lai, Alexandra Chronopoulou, Alexander Fraser
- Abstract summary: Bi-ACL is a framework that uses only target-side monolingual data and a bilingual dictionary to improve the performance of the MNMT model.
We show that Bi-ACL is more effective both in long-tail languages and in high-resource languages.
- Score: 103.90963418039473
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite advances in multilingual neural machine translation (MNMT), we argue
that there are still two major challenges in this area: data imbalance and
representation degeneration. The data imbalance problem refers to the imbalance
in the amount of parallel corpora for all language pairs, especially for
long-tail languages (i.e., very low-resource languages). The representation
degeneration problem refers to the problem of encoded tokens tending to appear
only in a small subspace of the full space available to the MNMT model. To
solve these two issues, we propose Bi-ACL, a framework that uses only
target-side monolingual data and a bilingual dictionary to improve the
performance of the MNMT model. We define two modules, named bidirectional
autoencoder and bidirectional contrastive learning, which we combine with an
online constrained beam search and a curriculum learning sampling strategy.
Extensive experiments show that our proposed method is more effective both in
long-tail languages and in high-resource languages. We also demonstrate that
our approach is capable of transferring knowledge between domains and languages
in zero-shot scenarios.
Related papers
- ColBERT-XM: A Modular Multi-Vector Representation Model for Zero-Shot
Multilingual Information Retrieval [10.664434993386523]
Current approaches circumvent the lack of high-quality labeled data in non-English languages.
We present a novel modular dense retrieval model that learns from the rich data of a single high-resource language.
arXiv Detail & Related papers (2024-02-23T02:21:24Z) - Extending Multilingual Machine Translation through Imitation Learning [60.15671816513614]
Imit-MNMT treats the task as an imitation learning process, which mimicks the behavior of an expert.
We show that our approach significantly improves the translation performance between the new and the original languages.
We also demonstrate that our approach is capable of solving copy and off-target problems.
arXiv Detail & Related papers (2023-11-14T21:04:03Z) - Low-resource Neural Machine Translation with Cross-modal Alignment [15.416659725808822]
We propose a cross-modal contrastive learning method to learn a shared space for all languages.
Experimental results and further analysis show that our method can effectively learn the cross-modal and cross-lingual alignment with a small amount of image-text pairs.
arXiv Detail & Related papers (2022-10-13T04:15:43Z) - Bridging the Data Gap between Training and Inference for Unsupervised
Neural Machine Translation [49.916963624249355]
A UNMT model is trained on the pseudo parallel data with translated source, and natural source sentences in inference.
The source discrepancy between training and inference hinders the translation performance of UNMT models.
We propose an online self-training approach, which simultaneously uses the pseudo parallel data natural source, translated target to mimic the inference scenario.
arXiv Detail & Related papers (2022-03-16T04:50:27Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z) - Bilingual alignment transfers to multilingual alignment for unsupervised
parallel text mining [3.4519649635864584]
This work presents methods for learning cross-lingual sentence representations using paired or unpaired bilingual texts.
We hypothesize that the cross-lingual alignment strategy is transferable, and therefore a model trained to align only two languages can encode multilingually more aligned representations.
arXiv Detail & Related papers (2021-04-15T17:51:22Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z) - Multi-task Learning for Multilingual Neural Machine Translation [32.81785430242313]
We propose a multi-task learning framework that jointly trains the model with the translation task on bitext data and two denoising tasks on the monolingual data.
We show that the proposed approach can effectively improve the translation quality for both high-resource and low-resource languages.
arXiv Detail & Related papers (2020-10-06T06:54:12Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.