Related papers: Directed Acyclic Transformer for Non-Autoregressive Machine Translation

Directed Acyclic Transformer for Non-Autoregressive Machine Translation

URL: http://arxiv.org/abs/2205.07459v1
Date: Mon, 16 May 2022 06:02:29 GMT
Title: Directed Acyclic Transformer for Non-Autoregressive Machine Translation
Authors: Fei Huang, Hao Zhou, Yang Liu, Hang Li, Minlie Huang
Abstract summary: Directed Acyclic Transfomer (DA-Transformer) represents hidden states in a Directed Acyclic Graph (DAG) DA-Transformer substantially outperforms previous NATs by about 3 BLEU on average.
Score: 93.31114105366461
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Non-autoregressive Transformers (NATs) significantly reduce the decoding latency by generating all tokens in parallel. However, such independent predictions prevent NATs from capturing the dependencies between the tokens for generating multiple possible translations. In this paper, we propose Directed Acyclic Transfomer (DA-Transformer), which represents the hidden states in a Directed Acyclic Graph (DAG), where each path of the DAG corresponds to a specific translation. The whole DAG simultaneously captures multiple translations and facilitates fast predictions in a non-autoregressive fashion. Experiments on the raw training data of WMT benchmark show that DA-Transformer substantially outperforms previous NATs by about 3 BLEU on average, which is the first NAT model that achieves competitive results with autoregressive Transformers without relying on knowledge distillation.

Related papers

Quick Back-Translation for Unsupervised Machine Translation [9.51657235413336]
We propose a two-for-one improvement to Transformer back-translation: Quick Back-Translation (QBT) QBT re-purposes the encoder as a generative model, and uses encoder-generated sequences to train the decoder. Experiments on various WMT benchmarks demonstrate that QBT dramatically outperforms standard back-translation only method in terms of training efficiency.
arXiv Detail & Related papers (2023-12-01T20:27:42Z)
Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order. In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z)
Fuzzy Alignments in Directed Acyclic Graph for Non-Autoregressive Machine Translation [18.205288788056787]
Non-autoregressive translation (NAT) reduces the decoding latency but suffers from performance degradation due to the multi-modality problem. In this paper, we hold the view that all paths in the graph are fuzzily aligned with the reference sentence. We do not require the exact alignment but train the model to maximize a fuzzy alignment score between the graph and reference, which takes translations captured in all modalities into account.
arXiv Detail & Related papers (2023-03-12T13:51:38Z)
Rephrasing the Reference for Non-Autoregressive Machine Translation [37.816198073720614]
Non-autoregressive neural machine translation (NAT) models suffer from the multi-modality problem that there may exist multiple possible translations of a source sentence. We introduce a rephraser to provide a better training target for NAT by rephrasing the reference sentence according to the NAT output. Our best variant achieves comparable performance to the autoregressive Transformer, while being 14.7 times more efficient in inference.
arXiv Detail & Related papers (2022-11-30T10:05:03Z)
Recurrence Boosts Diversity! Revisiting Recurrent Latent Variable in Transformer-Based Variational AutoEncoder for Diverse Text Generation [85.5379146125199]
Variational Auto-Encoder (VAE) has been widely adopted in text generation. We propose TRACE, a Transformer-based recurrent VAE structure.
arXiv Detail & Related papers (2022-10-22T10:25:35Z)
Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive Machine Translation [13.474844448367367]
Non-autoregressive models achieve significant decoding speedup in neural machine translation but lack the ability to capture sequential dependency. We present a Viterbi decoding framework for DA-Transformer, which guarantees to find the joint optimal solution for the translation and decoding path under any length constraint.
arXiv Detail & Related papers (2022-10-11T06:53:34Z)
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition [62.83832841523525]
We propose a fast and accurate parallel transformer, termed Paraformer. It accurately predicts the number of output tokens and extract hidden variables. It can attain comparable performance to the state-of-the-art AR transformer, with more than 10x speedup.
arXiv Detail & Related papers (2022-06-16T17:24:14Z)
Glancing Transformer for Non-Autoregressive Neural Machine Translation [58.87258329683682]
We propose a method to learn word interdependency for single-pass parallel generation models. With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8-15 times speedup.
arXiv Detail & Related papers (2020-08-18T13:04:03Z)
Non-Autoregressive Machine Translation with Disentangled Context Transformer [70.95181466892795]
State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens. We propose an attention-masking based model, called Disentangled Context (DisCo) transformer, that simultaneously generates all tokens given different contexts. Our model achieves competitive, if not better, performance compared to the state of the art in non-autoregressive machine translation while significantly reducing decoding time on average.
arXiv Detail & Related papers (2020-01-15T05:32:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.