Duplex Sequence-to-Sequence Learning for Reversible Machine Translation
- URL: http://arxiv.org/abs/2105.03458v1
- Date: Fri, 7 May 2021 18:21:57 GMT
- Title: Duplex Sequence-to-Sequence Learning for Reversible Machine Translation
- Authors: Zaixiang Zheng, Hao Zhou, Shujian Huang, Jiajun Chen, Jingjing Xu and
Lei Li
- Abstract summary: Sequence-to-sequence (seq2seq) problems such as machine translation are bidirectional.
We propose a em duplex seq2seq neural network, REDER, and apply it to machine translation.
Experiments on widely-used machine translation benchmarks verify that REDER achieves the first success of reversible machine translation.
- Score: 53.924941333388155
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Sequence-to-sequence (seq2seq) problems such as machine translation are
bidirectional, which naturally derive a pair of directional tasks and two
directional learning signals. However, typical seq2seq neural networks are {\em
simplex} that only model one unidirectional task, which cannot fully exploit
the potential of bidirectional learning signals from parallel data. To address
this issue, we propose a {\em duplex} seq2seq neural network, REDER (Reversible
Duplex Transformer), and apply it to machine translation. The architecture of
REDER has two ends, each of which specializes in a language so as to read and
yield sequences in that language. As a result, REDER can simultaneously learn
from the bidirectional signals, and enables {\em reversible machine
translation} by simply flipping the input and output ends, Experiments on
widely-used machine translation benchmarks verify that REDER achieves the first
success of reversible machine translation, which helps obtain considerable
gains over several strong baselines.
Related papers
- Duplex Diffusion Models Improve Speech-to-Speech Translation [1.4649095013539173]
Speech-to-speech translation is a sequence-to-sequence learning task that naturally has two directions.
We propose a duplex diffusion model that applies diffusion probabilistic models to both sides of a reversible duplex Conformer.
Our model enables reversible speech translation by simply flipping the input and output ends.
arXiv Detail & Related papers (2023-05-22T01:39:40Z) - Hierarchical Phrase-based Sequence-to-Sequence Learning [94.10257313923478]
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference.
Our approach trains two models: a discriminative derivation based on a bracketing grammar whose tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one.
arXiv Detail & Related papers (2022-11-15T05:22:40Z) - Look Backward and Forward: Self-Knowledge Distillation with
Bidirectional Decoder for Neural Machine Translation [9.279287354043289]
Self-Knowledge Distillation with Bidirectional Decoder for Neural Machine Translation(SBD-NMT)
We deploy a backward decoder which can act as an effective regularization method to the forward decoder.
Experiments show that our method is significantly better than the strong Transformer baselines on multiple machine translation data sets.
arXiv Detail & Related papers (2022-03-10T09:21:28Z) - s2s-ft: Fine-Tuning Pretrained Transformer Encoders for
Sequence-to-Sequence Learning [47.30689555136054]
We present a sequence-to-sequence fine-tuning toolkit s2s-ft, which adopts pretrained Transformers for conditional generation tasks.
S2s-ft achieves strong performance on several benchmarks of abstractive summarization, and question generation.
arXiv Detail & Related papers (2021-10-26T12:45:34Z) - Source and Target Bidirectional Knowledge Distillation for End-to-end
Speech Translation [88.78138830698173]
We focus on sequence-level knowledge distillation (SeqKD) from external text-based NMT models.
We train a bilingual E2E-ST model to predict paraphrased transcriptions as an auxiliary task with a single decoder.
arXiv Detail & Related papers (2021-04-13T19:00:51Z) - Dual-decoder Transformer for Joint Automatic Speech Recognition and
Multilingual Speech Translation [71.54816893482457]
We introduce dual-decoder Transformer, a new model architecture that jointly performs automatic speech recognition (ASR) and multilingual speech translation (ST)
Our models are based on the original Transformer architecture but consist of two decoders, each responsible for one task (ASR or ST)
arXiv Detail & Related papers (2020-11-02T04:59:50Z) - Efficient Wait-k Models for Simultaneous Machine Translation [46.01342928010307]
Simultaneous machine translation consists in starting output generation before the entire input sequence is available.
Wait-k decoders offer a simple but efficient approach for this problem.
We investigate the behavior of wait-k decoding in low resource settings for spoken corpora using IWSLT datasets.
arXiv Detail & Related papers (2020-05-18T11:14:23Z) - Multi-level Head-wise Match and Aggregation in Transformer for Textual
Sequence Matching [87.97265483696613]
We propose a new approach to sequence pair matching with Transformer, by learning head-wise matching representations on multiple levels.
Experiments show that our proposed approach can achieve new state-of-the-art performance on multiple tasks.
arXiv Detail & Related papers (2020-01-20T20:02:02Z) - Bi-Decoder Augmented Network for Neural Machine Translation [108.3931242633331]
We propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task.
Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space.
arXiv Detail & Related papers (2020-01-14T02:05:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.