Multilingual Mix: Example Interpolation Improves Multilingual Neural
Machine Translation
- URL: http://arxiv.org/abs/2203.07627v1
- Date: Tue, 15 Mar 2022 03:56:22 GMT
- Title: Multilingual Mix: Example Interpolation Improves Multilingual Neural
Machine Translation
- Authors: Yong Cheng, Ankur Bapna, Orhan Firat, Yuan Cao, Pidong Wang, Wolfgang
Macherey
- Abstract summary: We introduce multilingual crossover encoder-decoder (mXEncDec) to fuse language pairs at an instance level.
Our approach interpolates instances from different language pairs into joint crossover examples' in order to encourage sharing input and output spaces across languages.
- Score: 45.77509642452541
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual neural machine translation models are trained to maximize the
likelihood of a mix of examples drawn from multiple language pairs. The
dominant inductive bias applied to these models is a shared vocabulary and a
shared set of parameters across languages; the inputs and labels corresponding
to examples drawn from different language pairs might still reside in distinct
sub-spaces. In this paper, we introduce multilingual crossover encoder-decoder
(mXEncDec) to fuse language pairs at an instance level. Our approach
interpolates instances from different language pairs into joint `crossover
examples' in order to encourage sharing input and output spaces across
languages. To ensure better fusion of examples in multilingual settings, we
propose several techniques to improve example interpolation across dissimilar
languages under heavy data imbalance. Experiments on a large-scale WMT
multilingual dataset demonstrate that our approach significantly improves
quality on English-to-Many, Many-to-English and zero-shot translation tasks
(from +0.5 BLEU up to +5.5 BLEU points). Results on code-switching sets
demonstrate the capability of our approach to improve model generalization to
out-of-distribution multilingual examples. We also conduct qualitative and
quantitative representation comparisons to analyze the advantages of our
approach at the representation level.
Related papers
- Improving In-context Learning of Multilingual Generative Language Models with Cross-lingual Alignment [42.624862172666624]
We propose a simple yet effective cross-lingual alignment framework exploiting pairs of translation sentences.
It aligns the internal sentence representations across different languages via multilingual contrastive learning.
Experimental results show that even with less than 0.1 textperthousand of pre-training tokens, our alignment framework significantly boosts the cross-lingual abilities of generative language models.
arXiv Detail & Related papers (2023-11-14T11:24:08Z) - Multilingual Few-Shot Learning via Language Model Retrieval [18.465566186549072]
Transformer-based language models have achieved remarkable success in few-shot in-context learning.
We conduct a study of retrieving semantically similar few-shot samples and using them as the context.
We evaluate the proposed method on five natural language understanding datasets related to intent detection, question classification, sentiment analysis, and topic classification.
arXiv Detail & Related papers (2023-06-19T14:27:21Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - VECO 2.0: Cross-lingual Language Model Pre-training with
Multi-granularity Contrastive Learning [56.47303426167584]
We propose a cross-lingual pre-trained model VECO2.0 based on contrastive learning with multi-granularity alignments.
Specifically, the sequence-to-sequence alignment is induced to maximize the similarity of the parallel pairs and minimize the non-parallel pairs.
token-to-token alignment is integrated to bridge the gap between synonymous tokens excavated via the thesaurus dictionary from the other unpaired tokens in a bilingual instance.
arXiv Detail & Related papers (2023-04-17T12:23:41Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Multilingual Representation Distillation with Contrastive Learning [20.715534360712425]
We integrate contrastive learning into multilingual representation distillation and use it for quality estimation of parallel sentences.
We validate our approach with multilingual similarity search and corpus filtering tasks.
arXiv Detail & Related papers (2022-10-10T22:27:04Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Are Multilingual Models Effective in Code-Switching? [57.78477547424949]
We study the effectiveness of multilingual language models to understand their capability and adaptability to the mixed-language setting.
Our findings suggest that pre-trained multilingual models do not necessarily guarantee high-quality representations on code-switching.
arXiv Detail & Related papers (2021-03-24T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.