Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive
Machine Translation
- URL: http://arxiv.org/abs/2210.05193v1
- Date: Tue, 11 Oct 2022 06:53:34 GMT
- Title: Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive
Machine Translation
- Authors: Chenze Shao and Zhengrui Ma and Yang Feng
- Abstract summary: Non-autoregressive models achieve significant decoding speedup in neural machine translation but lack the ability to capture sequential dependency.
We present a Viterbi decoding framework for DA-Transformer, which guarantees to find the joint optimal solution for the translation and decoding path under any length constraint.
- Score: 13.474844448367367
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-autoregressive models achieve significant decoding speedup in neural
machine translation but lack the ability to capture sequential dependency.
Directed Acyclic Transformer (DA-Transformer) was recently proposed to model
sequential dependency with a directed acyclic graph. Consequently, it has to
apply a sequential decision process at inference time, which harms the global
translation accuracy. In this paper, we present a Viterbi decoding framework
for DA-Transformer, which guarantees to find the joint optimal solution for the
translation and decoding path under any length constraint. Experimental results
demonstrate that our approach consistently improves the performance of
DA-Transformer while maintaining a similar decoding speedup.
Related papers
- Investigating Recurrent Transformers with Dynamic Halt [64.862738244735]
We study the inductive biases of two major approaches to augmenting Transformers with a recurrent mechanism.
We propose and investigate novel ways to extend and combine the methods.
arXiv Detail & Related papers (2024-02-01T19:47:31Z) - Quick Back-Translation for Unsupervised Machine Translation [9.51657235413336]
We propose a two-for-one improvement to Transformer back-translation: Quick Back-Translation (QBT)
QBT re-purposes the encoder as a generative model, and uses encoder-generated sequences to train the decoder.
Experiments on various WMT benchmarks demonstrate that QBT dramatically outperforms standard back-translation only method in terms of training efficiency.
arXiv Detail & Related papers (2023-12-01T20:27:42Z) - Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order.
In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z) - Accelerating Transformer Inference for Translation via Parallel Decoding [2.89306442817912]
Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT)
We present three parallel decoding algorithms and test them on different languages and models.
arXiv Detail & Related papers (2023-05-17T17:57:34Z) - Recurrence Boosts Diversity! Revisiting Recurrent Latent Variable in
Transformer-Based Variational AutoEncoder for Diverse Text Generation [85.5379146125199]
Variational Auto-Encoder (VAE) has been widely adopted in text generation.
We propose TRACE, a Transformer-based recurrent VAE structure.
arXiv Detail & Related papers (2022-10-22T10:25:35Z) - Directed Acyclic Transformer for Non-Autoregressive Machine Translation [93.31114105366461]
Directed Acyclic Transfomer (DA-Transformer) represents hidden states in a Directed Acyclic Graph (DAG)
DA-Transformer substantially outperforms previous NATs by about 3 BLEU on average.
arXiv Detail & Related papers (2022-05-16T06:02:29Z) - Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input [54.82369261350497]
We propose a CTC-enhanced NAR transformer, which generates target sequence by refining predictions of the CTC module.
Experimental results show that our method outperforms all previous NAR counterparts and achieves 50x faster decoding speed than a strong AR baseline with only 0.0 0.3 absolute CER degradation on Aishell-1 and Aishell-2 datasets.
arXiv Detail & Related papers (2020-10-28T15:00:09Z) - Glancing Transformer for Non-Autoregressive Neural Machine Translation [58.87258329683682]
We propose a method to learn word interdependency for single-pass parallel generation models.
With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8-15 times speedup.
arXiv Detail & Related papers (2020-08-18T13:04:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.