Is Encoder-Decoder Redundant for Neural Machine Translation?
- URL: http://arxiv.org/abs/2210.11807v1
- Date: Fri, 21 Oct 2022 08:33:55 GMT
- Title: Is Encoder-Decoder Redundant for Neural Machine Translation?
- Authors: Yingbo Gao, Christian Herold, Zijian Yang, Hermann Ney
- Abstract summary: encoder-decoder architecture is still the de facto neural network architecture for state-of-the-art models.
In this work, we experiment with bilingual translation, translation with additional target monolingual data, and multilingual translation.
This alternative approach performs on par with the baseline encoder-decoder Transformer, suggesting that an encoder-decoder architecture might be redundant for neural machine translation.
- Score: 44.37101354412253
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Encoder-decoder architecture is widely adopted for sequence-to-sequence
modeling tasks. For machine translation, despite the evolution from long
short-term memory networks to Transformer networks, plus the introduction and
development of attention mechanism, encoder-decoder is still the de facto
neural network architecture for state-of-the-art models. While the motivation
for decoding information from some hidden space is straightforward, the strict
separation of the encoding and decoding steps into an encoder and a decoder in
the model architecture is not necessarily a must. Compared to the task of
autoregressive language modeling in the target language, machine translation
simply has an additional source sentence as context. Given the fact that neural
language models nowadays can already handle rather long contexts in the target
language, it is natural to ask whether simply concatenating the source and
target sentences and training a language model to do translation would work. In
this work, we investigate the aforementioned concept for machine translation.
Specifically, we experiment with bilingual translation, translation with
additional target monolingual data, and multilingual translation. In all cases,
this alternative approach performs on par with the baseline encoder-decoder
Transformer, suggesting that an encoder-decoder architecture might be redundant
for neural machine translation.
Related papers
- Machine Translation with Large Language Models: Decoder Only vs. Encoder-Decoder [0.0]
The project is focused on Indian regional languages, especially Telugu, Tamil, and Malayalam.
The model seeks to enable accurate and contextually appropriate translations across diverse language pairs.
arXiv Detail & Related papers (2024-09-12T00:21:05Z) - Decoder-Only or Encoder-Decoder? Interpreting Language Model as a
Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence.
Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text.
Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z) - Summarize and Generate to Back-translate: Unsupervised Translation of
Programming Languages [86.08359401867577]
Back-translation is widely known for its effectiveness for neural machine translation when little to no parallel data is available.
We propose performing back-translation via code summarization and generation.
We show that our proposed approach performs competitively with state-of-the-art methods.
arXiv Detail & Related papers (2022-05-23T08:20:41Z) - ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking
Inference [70.36083572306839]
This paper proposes a new training and inference paradigm for re-ranking.
We finetune a pretrained encoder-decoder model using in the form of document to query generation.
We show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference.
arXiv Detail & Related papers (2022-04-25T06:26:29Z) - DeltaLM: Encoder-Decoder Pre-training for Language Generation and
Translation by Augmenting Pretrained Multilingual Encoders [92.90543340071007]
We introduce DeltaLM, a pretrained multilingual encoder-decoder model.
Specifically, we augment the pretrained multilingual encoder with a decoder and pre-train it in a self-supervised way.
Experiments show that DeltaLM outperforms various strong baselines on both natural language generation and translation tasks.
arXiv Detail & Related papers (2021-06-25T16:12:10Z) - Improving Zero-shot Neural Machine Translation on Language-specific
Encoders-Decoders [19.44855809470709]
Recently, universal neural machine translation (NMT) with shared encoder-decoder gained good performance on zero-shot translation.
Unlike universal NMT, jointly trained language-specific encoders-decoders aim to achieve universal representation across non-shared modules.
We study zero-shot translation using language-specific encoders-decoders.
arXiv Detail & Related papers (2021-02-12T15:36:33Z) - Bi-Decoder Augmented Network for Neural Machine Translation [108.3931242633331]
We propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task.
Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space.
arXiv Detail & Related papers (2020-01-14T02:05:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.