Bi-Decoder Augmented Network for Neural Machine Translation
- URL: http://arxiv.org/abs/2001.04586v1
- Date: Tue, 14 Jan 2020 02:05:14 GMT
- Title: Bi-Decoder Augmented Network for Neural Machine Translation
- Authors: Boyuan Pan, Yazheng Yang, Zhou Zhao, Yueting Zhuang, Deng Cai
- Abstract summary: We propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task.
Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space.
- Score: 108.3931242633331
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural Machine Translation (NMT) has become a popular technology in recent
years, and the encoder-decoder framework is the mainstream among all the
methods. It's obvious that the quality of the semantic representations from
encoding is very crucial and can significantly affect the performance of the
model. However, existing unidirectional source-to-target architectures may
hardly produce a language-independent representation of the text because they
rely heavily on the specific relations of the given language pairs. To
alleviate this problem, in this paper, we propose a novel Bi-Decoder Augmented
Network (BiDAN) for the neural machine translation task. Besides the original
decoder which generates the target language sequence, we add an auxiliary
decoder to generate back the source language sequence at the training time.
Since each decoder transforms the representations of the input text into its
corresponding language, jointly training with two target ends can make the
shared encoder has the potential to produce a language-independent semantic
space. We conduct extensive experiments on several NMT benchmark datasets and
the results demonstrate the effectiveness of our proposed approach.
Related papers
- Machine Translation with Large Language Models: Decoder Only vs. Encoder-Decoder [0.0]
The project is focused on Indian regional languages, especially Telugu, Tamil, and Malayalam.
The model seeks to enable accurate and contextually appropriate translations across diverse language pairs.
arXiv Detail & Related papers (2024-09-12T00:21:05Z) - i-Code V2: An Autoregressive Generation Framework over Vision, Language,
and Speech Data [101.52821120195975]
i-Code V2 is first model capable of generating natural language from any combination of Vision, Language, and Speech data.
System is pretrained end-to-end on a large collection of dual- and single-modality datasets.
arXiv Detail & Related papers (2023-05-21T01:25:44Z) - Decoder-Only or Encoder-Decoder? Interpreting Language Model as a
Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence.
Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text.
Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z) - Is Encoder-Decoder Redundant for Neural Machine Translation? [44.37101354412253]
encoder-decoder architecture is still the de facto neural network architecture for state-of-the-art models.
In this work, we experiment with bilingual translation, translation with additional target monolingual data, and multilingual translation.
This alternative approach performs on par with the baseline encoder-decoder Transformer, suggesting that an encoder-decoder architecture might be redundant for neural machine translation.
arXiv Detail & Related papers (2022-10-21T08:33:55Z) - Look Backward and Forward: Self-Knowledge Distillation with
Bidirectional Decoder for Neural Machine Translation [9.279287354043289]
Self-Knowledge Distillation with Bidirectional Decoder for Neural Machine Translation(SBD-NMT)
We deploy a backward decoder which can act as an effective regularization method to the forward decoder.
Experiments show that our method is significantly better than the strong Transformer baselines on multiple machine translation data sets.
arXiv Detail & Related papers (2022-03-10T09:21:28Z) - DeltaLM: Encoder-Decoder Pre-training for Language Generation and
Translation by Augmenting Pretrained Multilingual Encoders [92.90543340071007]
We introduce DeltaLM, a pretrained multilingual encoder-decoder model.
Specifically, we augment the pretrained multilingual encoder with a decoder and pre-train it in a self-supervised way.
Experiments show that DeltaLM outperforms various strong baselines on both natural language generation and translation tasks.
arXiv Detail & Related papers (2021-06-25T16:12:10Z) - Improving Zero-shot Neural Machine Translation on Language-specific
Encoders-Decoders [19.44855809470709]
Recently, universal neural machine translation (NMT) with shared encoder-decoder gained good performance on zero-shot translation.
Unlike universal NMT, jointly trained language-specific encoders-decoders aim to achieve universal representation across non-shared modules.
We study zero-shot translation using language-specific encoders-decoders.
arXiv Detail & Related papers (2021-02-12T15:36:33Z) - VX2TEXT: End-to-End Learning of Video-Based Text Generation From
Multimodal Inputs [103.99315770490163]
We present a framework for text generation from multimodal inputs consisting of video plus text, speech, or audio.
Experiments demonstrate that our approach based on a single architecture outperforms the state-of-the-art on three video-based text-generation tasks.
arXiv Detail & Related papers (2021-01-28T15:22:36Z) - On the Sub-Layer Functionalities of Transformer Decoder [74.83087937309266]
We study how Transformer-based decoders leverage information from the source and target languages.
Based on these insights, we demonstrate that the residual feed-forward module in each Transformer decoder layer can be dropped with minimal loss of performance.
arXiv Detail & Related papers (2020-10-06T11:50:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.