Related papers: Is Encoder-Decoder Redundant for Neural Machine Translation?

Is Encoder-Decoder Redundant for Neural Machine Translation?

URL: http://arxiv.org/abs/2210.11807v1
Date: Fri, 21 Oct 2022 08:33:55 GMT
Title: Is Encoder-Decoder Redundant for Neural Machine Translation?
Authors: Yingbo Gao, Christian Herold, Zijian Yang, Hermann Ney
Abstract summary: encoder-decoder architecture is still the de facto neural network architecture for state-of-the-art models. In this work, we experiment with bilingual translation, translation with additional target monolingual data, and multilingual translation. This alternative approach performs on par with the baseline encoder-decoder Transformer, suggesting that an encoder-decoder architecture might be redundant for neural machine translation.
Score: 44.37101354412253
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Encoder-decoder architecture is widely adopted for sequence-to-sequence modeling tasks. For machine translation, despite the evolution from long short-term memory networks to Transformer networks, plus the introduction and development of attention mechanism, encoder-decoder is still the de facto neural network architecture for state-of-the-art models. While the motivation for decoding information from some hidden space is straightforward, the strict separation of the encoding and decoding steps into an encoder and a decoder in the model architecture is not necessarily a must. Compared to the task of autoregressive language modeling in the target language, machine translation simply has an additional source sentence as context. Given the fact that neural language models nowadays can already handle rather long contexts in the target language, it is natural to ask whether simply concatenating the source and target sentences and training a language model to do translation would work. In this work, we investigate the aforementioned concept for machine translation. Specifically, we experiment with bilingual translation, translation with additional target monolingual data, and multilingual translation. In all cases, this alternative approach performs on par with the baseline encoder-decoder Transformer, suggesting that an encoder-decoder architecture might be redundant for neural machine translation.

Related papers

Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation [28.07831604833682]
We investigate the issue of the decoder-only architecture to its lack of language transfer capability. We propose dividing the decoding process into two stages so that target tokens are explicitly excluded in the first stage. We impose contrastive learning on translation instructions, resulting in improved performance in zero-shot translation.
arXiv Detail & Related papers (2024-12-03T02:52:14Z)
Machine Translation with Large Language Models: Decoder Only vs. Encoder-Decoder [0.0]
The project is focused on Indian regional languages, especially Telugu, Tamil, and Malayalam. The model seeks to enable accurate and contextually appropriate translations across diverse language pairs.
arXiv Detail & Related papers (2024-09-12T00:21:05Z)
Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence. Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text. Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z)
Summarize and Generate to Back-translate: Unsupervised Translation of Programming Languages [86.08359401867577]
Back-translation is widely known for its effectiveness for neural machine translation when little to no parallel data is available. We propose performing back-translation via code summarization and generation. We show that our proposed approach performs competitively with state-of-the-art methods.
arXiv Detail & Related papers (2022-05-23T08:20:41Z)
ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference [70.36083572306839]
This paper proposes a new training and inference paradigm for re-ranking. We finetune a pretrained encoder-decoder model using in the form of document to query generation. We show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference.
arXiv Detail & Related papers (2022-04-25T06:26:29Z)
DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders [92.90543340071007]
We introduce DeltaLM, a pretrained multilingual encoder-decoder model. Specifically, we augment the pretrained multilingual encoder with a decoder and pre-train it in a self-supervised way. Experiments show that DeltaLM outperforms various strong baselines on both natural language generation and translation tasks.
arXiv Detail & Related papers (2021-06-25T16:12:10Z)
Improving Zero-shot Neural Machine Translation on Language-specific Encoders-Decoders [19.44855809470709]
Recently, universal neural machine translation (NMT) with shared encoder-decoder gained good performance on zero-shot translation. Unlike universal NMT, jointly trained language-specific encoders-decoders aim to achieve universal representation across non-shared modules. We study zero-shot translation using language-specific encoders-decoders.
arXiv Detail & Related papers (2021-02-12T15:36:33Z)
Bi-Decoder Augmented Network for Neural Machine Translation [108.3931242633331]
We propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task. Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space.
arXiv Detail & Related papers (2020-01-14T02:05:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.