Related papers: AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation

AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation

URL: http://arxiv.org/abs/2305.09515v3
Date: Wed, 13 Dec 2023 10:24:00 GMT
Title: AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation
Authors: Tong Wu, Zhihao Fan, Xiao Liu, Yeyun Gong, Yelong Shen, Jian Jiao, Hai-Tao Zheng, Juntao Li, Zhongyu Wei, Jian Guo, Nan Duan, Weizhu Chen
Abstract summary: We introduce Auto-Regressive Diffusion (AR-Diffusion) to account for the inherent sequential characteristic of natural language. AR-Diffusion ensures that the generation of tokens on the right depends on the generated ones on the left, a mechanism achieved through employing a dynamic number of denoising steps. In a series of experiments on various text generation tasks, AR-Diffusion clearly demonstrated its superiority over existing diffusion language models.
Score: 138.98095392584693
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models have gained significant attention in the realm of image generation due to their exceptional performance. Their success has been recently expanded to text generation via generating all tokens within a sequence concurrently. However, natural language exhibits a far more pronounced sequential dependency in comparison to images, and the majority of existing language models are trained with a left-to-right auto-regressive approach. To account for the inherent sequential characteristic of natural language, we introduce Auto-Regressive Diffusion (AR-Diffusion). AR-Diffusion ensures that the generation of tokens on the right depends on the generated ones on the left, a mechanism achieved through employing a dynamic number of denoising steps that vary based on token position. This results in tokens on the left undergoing fewer denoising steps than those on the right, thereby enabling them to generate earlier and subsequently influence the generation of tokens on the right. In a series of experiments on various text generation tasks, including text summarization, machine translation, and common sense generation, AR-Diffusion clearly demonstrated its superiority over existing diffusion language models and that it can be $100\times\sim600\times$ faster when achieving comparable results. Our code is available at https://github.com/microsoft/ProphetNet/tree/master/AR-diffusion.

Related papers

D-AR: Diffusion via Autoregressive Models [21.03363985989625]
Diffusion via Autoregressive models (D-AR) is a new paradigm recasting the image diffusion process as a vanilla autoregressive procedure.<n>Our method achieves 2.09 FID using a 775M Llama backbone with 256 discrete tokens.
arXiv Detail & Related papers (2025-05-29T17:09:25Z)
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models [92.18057318458528]
Token-Shuffle is a novel method that reduces the number of image tokens in Transformer. Our strategy requires no additional pretrained text-encoder and enables MLLMs to support extremely high-resolution image synthesis. In GenAI-benchmark, our 2.7B model achieves 0.77 overall score on hard prompts, outperforming AR models LlamaGen by 0.18 and diffusion models LDM by 0.15.
arXiv Detail & Related papers (2025-04-24T17:59:56Z)
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens [66.02261367232256]
Multimodal Large Language Models (MLLMs) aim to unify visual comprehension and generation. Existing approaches rely on spatial visual tokens, where image patches are encoded and arranged according to a spatial order. In this paper, we build a proper visual language by reconstructing diffusion timesteps to learn discrete visual tokens.
arXiv Detail & Related papers (2025-04-20T16:14:28Z)
DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation [24.85655658070008]
Diffusion Transformer Autoregressive Modeling (DiTAR) is a patch-based autoregressive framework combining a language model with a diffusion transformer. In zero-shot speech generation, DiTAR achieves state-of-the-art performance in robustness, speaker similarity, and naturalness.
arXiv Detail & Related papers (2025-02-06T10:09:49Z)
LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding [30.630803933771865]
Experimental results demonstrate the efficacy of our method in providing a substantial speed-up over speculative decoding. LANTERN increases speed-ups by $mathbf1.75times$ and $mathbf1.82times$, as compared to greedy decoding and random sampling.
arXiv Detail & Related papers (2024-10-04T12:21:03Z)
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? [10.72249123249003]
We revisit diffusion models, highlighting their capacity for holistic context modeling and parallel decoding. We introduce a novel architecture, LaDiC, which utilizes a split BERT to create a dedicated latent space for captions. LaDiC achieves state-of-the-art performance for diffusion-based methods on the MS dataset with 38.2 BLEU@4 and 126.2 CIDEr.
arXiv Detail & Related papers (2024-04-16T17:47:16Z)
DiffCap: Exploring Continuous Diffusion on Image Captioning [16.572887005727555]
We propose a novel method DiffCap to apply continuous diffusions on image captioning. Our method transforms discrete tokens in a natural way and applies continuous diffusion on them to successfully fuse extracted image features. Our experiments on COCO dataset demonstrate that our method uses a much simpler structure to achieve comparable results to the previous non-autoregressive works.
arXiv Detail & Related papers (2023-05-20T09:02:10Z)
TESS: Text-to-Text Self-Conditioned Simplex Diffusion [56.881170312435444]
Text-to-text Self-conditioned Simplex Diffusion employs a new form of self-conditioning, and applies the diffusion process on the logit simplex space rather than the learned embedding space. We demonstrate that TESS outperforms state-of-the-art non-autoregressive models, requires fewer diffusion steps with minimal drop in performance, and is competitive with pretrained autoregressive sequence-to-sequence models.
arXiv Detail & Related papers (2023-05-15T06:33:45Z)
A Cheaper and Better Diffusion Language Model with Soft-Masked Noise [62.719656543880596]
Masked-Diffuse LM is a novel diffusion model for language modeling, inspired by linguistic features in languages. Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data. We demonstrate that our Masked-Diffuse LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency.
arXiv Detail & Related papers (2023-04-10T17:58:42Z)
SeqDiffuSeq: Text Diffusion with Encoder-Decoder Transformers [50.90457644954857]
In this work, we apply diffusion models to approach sequence-to-sequence text generation. We propose SeqDiffuSeq, a text diffusion model for sequence-to-sequence generation. Experiment results illustrate the good performance on sequence-to-sequence generation in terms of text quality and inference time.
arXiv Detail & Related papers (2022-12-20T15:16:24Z)
eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers [87.52504764677226]
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis. We train an ensemble of text-to-image diffusion models specialized for different stages synthesis. Our ensemble of diffusion models, called eDiffi, results in improved text alignment while maintaining the same inference cost.
arXiv Detail & Related papers (2022-11-02T17:43:04Z)
ELMER: A Non-Autoregressive Pre-trained Language Model for Efficient and Effective Text Generation [97.64625999380425]
We study the text generation task under the approach of pre-trained language models (PLMs) By leveraging the early exit technique, ELMER enables the token generations at different layers, according to their prediction confidence. Experiments on three text generation tasks show that ELMER significantly outperforms NAR models.
arXiv Detail & Related papers (2022-10-24T14:46:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.