Improving Zero-shot Neural Machine Translation on Language-specific
Encoders-Decoders
- URL: http://arxiv.org/abs/2102.06578v1
- Date: Fri, 12 Feb 2021 15:36:33 GMT
- Title: Improving Zero-shot Neural Machine Translation on Language-specific
Encoders-Decoders
- Authors: Junwei Liao, Yu Shi, Ming Gong, Linjun Shou, Hong Qu, Michael Zeng
- Abstract summary: Recently, universal neural machine translation (NMT) with shared encoder-decoder gained good performance on zero-shot translation.
Unlike universal NMT, jointly trained language-specific encoders-decoders aim to achieve universal representation across non-shared modules.
We study zero-shot translation using language-specific encoders-decoders.
- Score: 19.44855809470709
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, universal neural machine translation (NMT) with shared
encoder-decoder gained good performance on zero-shot translation. Unlike
universal NMT, jointly trained language-specific encoders-decoders aim to
achieve universal representation across non-shared modules, each of which is
for a language or language family. The non-shared architecture has the
advantage of mitigating internal language competition, especially when the
shared vocabulary and model parameters are restricted in their size. However,
the performance of using multiple encoders and decoders on zero-shot
translation still lags behind universal NMT. In this work, we study zero-shot
translation using language-specific encoders-decoders. We propose to generalize
the non-shared architecture and universal NMT by differentiating the
Transformer layers between language-specific and interlingua. By selectively
sharing parameters and applying cross-attentions, we explore maximizing the
representation universality and realizing the best alignment of
language-agnostic information. We also introduce a denoising auto-encoding
(DAE) objective to jointly train the model with the translation task in a
multi-task manner. Experiments on two public multilingual parallel datasets
show that our proposed model achieves a competitive or better results than
universal NMT and strong pivot baseline. Moreover, we experiment incrementally
adding new language to the trained model by only updating the new model
parameters. With this little effort, the zero-shot translation between this
newly added language and existing languages achieves a comparable result with
the model trained jointly from scratch on all languages.
Related papers
- Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models.
We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks.
OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z) - Is Encoder-Decoder Redundant for Neural Machine Translation? [44.37101354412253]
encoder-decoder architecture is still the de facto neural network architecture for state-of-the-art models.
In this work, we experiment with bilingual translation, translation with additional target monolingual data, and multilingual translation.
This alternative approach performs on par with the baseline encoder-decoder Transformer, suggesting that an encoder-decoder architecture might be redundant for neural machine translation.
arXiv Detail & Related papers (2022-10-21T08:33:55Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model.
We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z) - Universal Vector Neural Machine Translation With Effective Attention [0.0]
We propose a singular model for Neural Machine Translation based on encoder-decoder models.
We introduce a neutral/universal model representation that can be used to predict more than one language.
arXiv Detail & Related papers (2020-06-09T01:13:57Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z) - Multilingual Machine Translation: Closing the Gap between Shared and
Language-specific Encoder-Decoders [20.063065730835874]
State-of-the-art multilingual machine translation relies on a universal encoder-decoder.
We propose an alternative approach that is based on language-specific encoder-decoders.
arXiv Detail & Related papers (2020-04-14T15:02:24Z) - Bi-Decoder Augmented Network for Neural Machine Translation [108.3931242633331]
We propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task.
Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space.
arXiv Detail & Related papers (2020-01-14T02:05:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.