Infusing Sequential Information into Conditional Masked Translation
Model with Self-Review Mechanism
- URL: http://arxiv.org/abs/2010.09194v2
- Date: Mon, 26 Oct 2020 13:22:06 GMT
- Title: Infusing Sequential Information into Conditional Masked Translation
Model with Self-Review Mechanism
- Authors: Pan Xie, Zhi Cui, Xiuyin Chen, Xiaohui Hu, Jianwei Cui, Bin Wang
- Abstract summary: Non-autoregressive models generate target words in a parallel way, which achieve a faster decoding speed but at the sacrifice of translation accuracy.
We propose a Self-Review Mechanism to infuse sequential information into a conditional masked translation model.
Our model even surpasses a typical left-to-right Transformer model, while significantly speeding up decoding.
- Score: 9.641454891414751
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-autoregressive models generate target words in a parallel way, which
achieve a faster decoding speed but at the sacrifice of translation accuracy.
To remedy a flawed translation by non-autoregressive models, a promising
approach is to train a conditional masked translation model (CMTM), and refine
the generated results within several iterations. Unfortunately, such approach
hardly considers the \textit{sequential dependency} among target words, which
inevitably results in a translation degradation. Hence, instead of solely
training a Transformer-based CMTM, we propose a Self-Review Mechanism to infuse
sequential information into it. Concretely, we insert a left-to-right mask to
the same decoder of CMTM, and then induce it to autoregressively review whether
each generated word from CMTM is supposed to be replaced or kept. The
experimental results (WMT14 En$\leftrightarrow$De and WMT16
En$\leftrightarrow$Ro) demonstrate that our model uses dramatically less
training computations than the typical CMTM, as well as outperforms several
state-of-the-art non-autoregressive models by over 1 BLEU. Through knowledge
distillation, our model even surpasses a typical left-to-right Transformer
model, while significantly speeding up decoding.
Related papers
- Efficient Machine Translation with a BiLSTM-Attention Approach [0.0]
This paper proposes a novel Seq2Seq model aimed at improving translation quality while reducing the storage space required by the model.
The model employs a Bidirectional Long Short-Term Memory network (Bi-LSTM) as the encoder to capture the context information of the input sequence.
Compared to the current mainstream Transformer model, our model achieves superior performance on the WMT14 machine translation dataset.
arXiv Detail & Related papers (2024-10-29T01:12:50Z) - Autoregressive Speech Synthesis without Vector Quantization [135.4776759536272]
We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS)
MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition.
arXiv Detail & Related papers (2024-07-11T14:36:53Z) - Segment-Based Interactive Machine Translation for Pre-trained Models [2.0871483263418806]
We explore the use of pre-trained large language models (LLM) in interactive machine translation environments.
The system generates perfect translations interactively using the feedback provided by the user at each iteration.
We compare the performance of mBART, mT5 and a state-of-the-art (SoTA) machine translation model on a benchmark dataset regarding user effort.
arXiv Detail & Related papers (2024-07-09T16:04:21Z) - Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC [51.34222224728979]
This paper introduces a series of innovative techniques to enhance the translation quality of Non-Autoregressive Translation (NAT) models.
We propose fine-tuning Pretrained Multilingual Language Models (PMLMs) with the CTC loss to train NAT models effectively.
Our model exhibits a remarkable speed improvement of 16.35 times compared to the autoregressive model.
arXiv Detail & Related papers (2023-06-10T05:24:29Z) - Non-Autoregressive Translation by Learning Target Categorical Codes [59.840510037250944]
We propose CNAT, which learns implicitly categorical codes as latent variables into the non-autoregressive decoding.
Experiment results show that our model achieves comparable or better performance in machine translation tasks.
arXiv Detail & Related papers (2021-03-21T14:12:34Z) - Fast Sequence Generation with Multi-Agent Reinforcement Learning [40.75211414663022]
Non-autoregressive decoding has been proposed in machine translation to speed up the inference time by generating all words in parallel.
We propose a simple and efficient model for Non-Autoregressive sequence Generation (NAG) with a novel training paradigm: Counterfactuals-critical Multi-Agent Learning (CMAL)
On MSCOCO image captioning benchmark, our NAG method achieves a performance comparable to state-of-the-art autoregressive models, while brings 13.9x decoding speedup.
arXiv Detail & Related papers (2021-01-24T12:16:45Z) - Enriching Non-Autoregressive Transformer with Syntactic and
SemanticStructures for Neural Machine Translation [54.864148836486166]
We propose to incorporate the explicit syntactic and semantic structures of languages into a non-autoregressive Transformer.
Our model achieves a significantly faster speed, as well as keeps the translation quality when compared with several state-of-the-art non-autoregressive models.
arXiv Detail & Related papers (2021-01-22T04:12:17Z) - Tight Integrated End-to-End Training for Cascaded Speech Translation [40.76367623739673]
A cascaded speech translation model relies on discrete and non-differentiable transcription.
Direct speech translation is an alternative method to avoid error propagation.
This work explores the feasibility of collapsing the entire cascade components into a single end-to-end trainable model.
arXiv Detail & Related papers (2020-11-24T15:43:49Z) - Glancing Transformer for Non-Autoregressive Neural Machine Translation [58.87258329683682]
We propose a method to learn word interdependency for single-pass parallel generation models.
With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8-15 times speedup.
arXiv Detail & Related papers (2020-08-18T13:04:03Z) - Cascaded Text Generation with Markov Transformers [122.76100449018061]
Two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies.
This work proposes an autoregressive model with sub-linear parallel time generation. Noting that conditional random fields with bounded context can be decoded in parallel, we propose an efficient cascaded decoding approach for generating high-quality output.
This approach requires only a small modification from standard autoregressive training, while showing competitive accuracy/speed tradeoff compared to existing methods on five machine translation datasets.
arXiv Detail & Related papers (2020-06-01T17:52:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.