On Learning Language-Invariant Representations for Universal Machine
Translation
- URL: http://arxiv.org/abs/2008.04510v1
- Date: Tue, 11 Aug 2020 04:45:33 GMT
- Title: On Learning Language-Invariant Representations for Universal Machine
Translation
- Authors: Han Zhao, Junjie Hu, Andrej Risteski
- Abstract summary: Universal machine translation aims to learn to translate between any pair of languages.
We prove certain impossibilities of this endeavour in general and prove positive results in the presence of additional (but natural) structure of data.
We believe our theoretical insights and implications contribute to the future algorithmic design of universal machine translation.
- Score: 33.40094622605891
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of universal machine translation is to learn to translate between
any pair of languages, given a corpus of paired translated documents for
\emph{a small subset} of all pairs of languages. Despite impressive empirical
results and an increasing interest in massively multilingual models,
theoretical analysis on translation errors made by such universal machine
translation models is only nascent. In this paper, we formally prove certain
impossibilities of this endeavour in general, as well as prove positive results
in the presence of additional (but natural) structure of data.
For the former, we derive a lower bound on the translation error in the
many-to-many translation setting, which shows that any algorithm aiming to
learn shared sentence representations among multiple language pairs has to make
a large translation error on at least one of the translation tasks, if no
assumption on the structure of the languages is made. For the latter, we show
that if the paired documents in the corpus follow a natural
\emph{encoder-decoder} generative process, we can expect a natural notion of
``generalization'': a linear number of language pairs, rather than quadratic,
suffices to learn a good representation. Our theory also explains what kinds of
connection graphs between pairs of languages are better suited: ones with
longer paths result in worse sample complexity in terms of the total number of
documents per language pair needed. We believe our theoretical insights and
implications contribute to the future algorithmic design of universal machine
translation.
Related papers
- Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages [55.157295899188476]
neural machine translation systems learn to map sentences of different languages into a common representation space.
In this work, we test this hypothesis by zero-shot translating from unseen languages.
We demonstrate that this setup enables zero-shot translation from entirely unseen languages.
arXiv Detail & Related papers (2024-08-05T07:58:58Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - Improving Zero-Shot Multilingual Translation with Universal
Representations and Cross-Mappings [23.910477693942905]
Improved zero-shot translation requires the model to learn universal representations and cross-mapping relationships.
We propose the state's distance based on the optimal theory to model the difference of the representations output by the encoder.
We propose an agreement-based training scheme, which can help the model make consistent predictions.
arXiv Detail & Related papers (2022-10-28T02:47:05Z) - Informative Language Representation Learning for Massively Multilingual
Neural Machine Translation [47.19129812325682]
In a multilingual neural machine translation model, an artificial language token is usually used to guide translation into the desired target language.
Recent studies show that prepending language tokens sometimes fails to navigate the multilingual neural machine translation models into right translation directions.
We propose two methods, language embedding embodiment and language-aware multi-head attention, to learn informative language representations to channel translation into right directions.
arXiv Detail & Related papers (2022-09-04T04:27:17Z) - Complete Multilingual Neural Machine Translation [44.98358050355681]
We study the use of multi-way aligned examples to enrich the original English-centric parallel corpora.
We call MNMT with such connectivity pattern complete Multilingual Neural Machine Translation (cMNMT)
In combination with a novel training data sampling strategy that is conditioned on the target language only, cMNMT yields competitive translation quality for all language pairs.
arXiv Detail & Related papers (2020-10-20T13:03:48Z) - Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model.
We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z) - Fine-Grained Analysis of Cross-Linguistic Syntactic Divergences [18.19093600136057]
We propose a framework for extracting divergence patterns for any language pair from a parallel corpus.
We show that our framework provides a detailed picture of cross-language divergences, generalizes previous approaches, and lends itself to full automation.
arXiv Detail & Related papers (2020-05-07T13:05:03Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - Knowledge Distillation for Multilingual Unsupervised Neural Machine
Translation [61.88012735215636]
Unsupervised neural machine translation (UNMT) has recently achieved remarkable results for several language pairs.
UNMT can only translate between a single language pair and cannot produce translation results for multiple language pairs at the same time.
In this paper, we empirically introduce a simple method to translate between thirteen languages using a single encoder and a single decoder.
arXiv Detail & Related papers (2020-04-21T17:26:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.