Related papers: Non-autoregressive Streaming Transformer for Simultaneous Translation

Non-autoregressive Streaming Transformer for Simultaneous Translation

URL: http://arxiv.org/abs/2310.14883v1
Date: Mon, 23 Oct 2023 12:52:24 GMT
Title: Non-autoregressive Streaming Transformer for Simultaneous Translation
Authors: Zhengrui Ma, Shaolei Zhang, Shoutao Guo, Chenze Shao, Min Zhang, Yang Feng
Abstract summary: Simultaneous machine translation (SiMT) models are trained to strike a balance between latency and translation quality. We propose non-autoregressive streaming Transformer (NAST) NAST comprises a unidirectional encoder and a non-autoregressive decoder with intra-chunk parallelism.
Score: 45.96493039754171
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Simultaneous machine translation (SiMT) models are trained to strike a balance between latency and translation quality. However, training these models to achieve high quality while maintaining low latency often leads to a tendency for aggressive anticipation. We argue that such issue stems from the autoregressive architecture upon which most existing SiMT models are built. To address those issues, we propose non-autoregressive streaming Transformer (NAST) which comprises a unidirectional encoder and a non-autoregressive decoder with intra-chunk parallelism. We enable NAST to generate the blank token or repetitive tokens to adjust its READ/WRITE strategy flexibly, and train it to maximize the non-monotonic latent alignment with an alignment-based latency loss. Experiments on various SiMT benchmarks demonstrate that NAST outperforms previous strong autoregressive SiMT baselines.

Related papers

LLMs Can Achieve High-quality Simultaneous Machine Translation as Efficiently as Offline [16.124385656402744]
Large Language Models (LLMs) perform excellently in offline machine translation even with a simple prompt "Translate the following sentence from [src lang] into [tgt lang]:" We propose a novel paradigm that includes constructing supervised fine-tuning data for simultaneous machine translation (SiMT) Our approach achieves state-of-the-art performance across various SiMT benchmarks, and preserves the original abilities of offline translation.
arXiv Detail & Related papers (2025-04-13T13:45:53Z)
PsFuture: A Pseudo-Future-based Zero-Shot Adaptive Policy for Simultaneous Machine Translation [8.1299957975257]
Simultaneous Machine Translation (SiMT) requires target tokens to be generated in real-time as streaming source tokens are consumed. We propose PsFuture, the first zero-shot adaptive read/write policy for SiMT. We introduce a novel training strategy, Prefix-to-Full (P2F), specifically tailored to adjust offline translation models for SiMT applications.
arXiv Detail & Related papers (2024-10-05T08:06:33Z)
Parallelizing Autoregressive Generation with Variational State Space Models [6.29143368345159]
We propose a variational autoencoder (VAE) where both the encoder and decoder are SSMs. Since sampling the latent variables and decoding them with the SSM can be parallelized, both training and generation can be conducted in parallel. The decoder recurrence allows generation to be resumed without reprocessing the whole sequence.
arXiv Detail & Related papers (2024-07-11T11:41:29Z)
Masked Audio Generation using a Single Non-Autoregressive Transformer [90.11646612273965]
MAGNeT is a masked generative sequence modeling method that operates directly over several streams of audio tokens. We demonstrate the efficiency of MAGNeT for the task of text-to-music and text-to-audio generation. We shed light on the importance of each of the components comprising MAGNeT, together with pointing to the trade-offs between autoregressive and non-autoregressive modeling.
arXiv Detail & Related papers (2024-01-09T14:29:39Z)
Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC [51.34222224728979]
This paper introduces a series of innovative techniques to enhance the translation quality of Non-Autoregressive Translation (NAT) models. We propose fine-tuning Pretrained Multilingual Language Models (PMLMs) with the CTC loss to train NAT models effectively. Our model exhibits a remarkable speed improvement of 16.35 times compared to the autoregressive model.
arXiv Detail & Related papers (2023-06-10T05:24:29Z)
Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order. In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z)
Enriching Non-Autoregressive Transformer with Syntactic and SemanticStructures for Neural Machine Translation [54.864148836486166]
We propose to incorporate the explicit syntactic and semantic structures of languages into a non-autoregressive Transformer. Our model achieves a significantly faster speed, as well as keeps the translation quality when compared with several state-of-the-art non-autoregressive models.
arXiv Detail & Related papers (2021-01-22T04:12:17Z)
Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism [9.641454891414751]
Non-autoregressive models generate target words in a parallel way, which achieve a faster decoding speed but at the sacrifice of translation accuracy. We propose a Self-Review Mechanism to infuse sequential information into a conditional masked translation model. Our model even surpasses a typical left-to-right Transformer model, while significantly speeding up decoding.
arXiv Detail & Related papers (2020-10-19T03:38:56Z)
Glancing Transformer for Non-Autoregressive Neural Machine Translation [58.87258329683682]
We propose a method to learn word interdependency for single-pass parallel generation models. With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8-15 times speedup.
arXiv Detail & Related papers (2020-08-18T13:04:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.