Improving Zero-Shot Translation by Disentangling Positional Information
- URL: http://arxiv.org/abs/2012.15127v1
- Date: Wed, 30 Dec 2020 12:20:41 GMT
- Title: Improving Zero-Shot Translation by Disentangling Positional Information
- Authors: Danni Liu, Jan Niehues, James Cross, Francisco Guzm\'an, Xian Li
- Abstract summary: We show that a main factor causing the language-specific representations is the positional correspondence to input tokens.
We gain up to 18.5 BLEU points on zero-shot translation while retaining quality on supervised directions.
- Score: 24.02434897109097
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual neural machine translation has shown the capability of directly
translating between language pairs unseen in training, i.e. zero-shot
translation. Despite being conceptually attractive, it often suffers from low
output quality. The difficulty of generalizing to new translation directions
suggests the model representations are highly specific to those language pairs
seen in training. We demonstrate that a main factor causing the
language-specific representations is the positional correspondence to input
tokens. We show that this can be easily alleviated by removing residual
connections in an encoder layer. With this modification, we gain up to 18.5
BLEU points on zero-shot translation while retaining quality on supervised
directions. The improvements are particularly prominent between related
languages, where our proposed model outperforms pivot-based translation.
Moreover, our approach allows easy integration of new languages, which
substantially expands translation coverage. By thorough inspections of the
hidden layer outputs, we show that our approach indeed leads to more
language-independent representations.
Related papers
- Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages [55.157295899188476]
neural machine translation systems learn to map sentences of different languages into a common representation space.
In this work, we test this hypothesis by zero-shot translating from unseen languages.
We demonstrate that this setup enables zero-shot translation from entirely unseen languages.
arXiv Detail & Related papers (2024-08-05T07:58:58Z) - Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation [16.368747052909214]
We introduce the identity pair, a sentence translated into itself, to address the lack of the base measure in multilingual investigations.
We demonstrate that the encoder transfers the source language to the representational subspace of the target language instead of the language-agnostic state.
Based on our findings, we propose two methods: 1) low-rank language-specific embedding at the encoder, and 2) language-specific contrastive learning of the representation at the decoder.
arXiv Detail & Related papers (2024-06-12T11:16:30Z) - On the Off-Target Problem of Zero-Shot Multilingual Neural Machine
Translation [104.85258654917297]
We find that failing in encoding discriminative target language signal will lead to off-target and a closer lexical distance.
We propose Language Aware Vocabulary Sharing (LAVS) to construct the multilingual vocabulary.
We conduct experiments on a multilingual machine translation benchmark in 11 languages.
arXiv Detail & Related papers (2023-05-18T12:43:31Z) - Adapting to Non-Centered Languages for Zero-shot Multilingual
Translation [12.487990897680422]
We propose a simple, lightweight yet effective language-specific modeling method by adapting to non-centered languages.
Experiments with Transformer on IWSLT17, Europarl, TED talks, and OPUS-100 datasets show that our method can easily fit non-centered data conditions.
arXiv Detail & Related papers (2022-09-09T06:34:12Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Rethinking Zero-shot Neural Machine Translation: From a Perspective of
Latent Variables [28.101782382170306]
We introduce a denoising autoencoder objective based on pivot language into traditional training objective to improve the translation accuracy on zero-shot directions.
We demonstrate that the proposed method is able to effectively eliminate the spurious correlations and significantly outperforms state-of-the-art methods with a remarkable performance.
arXiv Detail & Related papers (2021-09-10T07:18:53Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.