Related papers: Autoregressive Models Rival Diffusion Models at ANY-ORDER Generation

Autoregressive Models Rival Diffusion Models at ANY-ORDER Generation

URL: http://arxiv.org/abs/2601.13228v1
Date: Mon, 19 Jan 2026 17:03:48 GMT
Title: Autoregressive Models Rival Diffusion Models at ANY-ORDER Generation
Authors: Tianqi Du, Lizhe Fang, Weijie Yang, Chenheng Zhang, Zeming Wei, Yifei Wang, Yisen Wang,
Abstract summary: We propose Any-order Any-subset Autoregressive modeling (A3)<n>A3 is a framework that extends the standard AR factorization to arbitrary token groups and generation orders.<n> Experiments on question answering, commonsense reasoning, and story infilling demonstrate that A3 outperforms diffusion-based models.
Score: 35.63237650402896
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion language models enable any-order generation and bidirectional conditioning, offering appealing flexibility for tasks such as infilling, rewriting, and self-correction. However, their formulation-predicting one part of a sequence from another within a single-step dependency-limits modeling depth and often yields lower sample quality and stability than autoregressive (AR) models. To address this, we revisit autoregressive modeling as a foundation and reformulate diffusion-style training into a structured multi-group prediction process. We propose Any-order Any-subset Autoregressive modeling (A3), a generalized framework that extends the standard AR factorization to arbitrary token groups and generation orders. A3 preserves the probabilistic rigor and multi-layer dependency modeling of AR while inheriting diffusion models' flexibility for parallel and bidirectional generation. We implement A3 through a two-stream attention architecture and a progressive adaptation strategy that transitions pretrained AR models toward any-order prediction. Experiments on question answering, commonsense reasoning, and story infilling demonstrate that A3 outperforms diffusion-based models while maintaining flexible decoding. This work offers a unified approach for a flexible, efficient, and novel language modeling paradigm.

Related papers

Auto-Regressive Masked Diffusion Models [9.239507801466322]
Masked diffusion models (MDMs) have emerged as a promising approach for language modeling.<n>They face a performance gap compared to autoregressive models (ARMs) and require more training iterations.<n>We present the Auto-Regressive Masked Diffusion model, which unifies the training efficiency of autoregressive models with the parallel generation capabilities of diffusion-based models.
arXiv Detail & Related papers (2026-01-23T18:42:30Z)
Composition and Alignment of Diffusion Models using Constrained Learning [79.36736636241564]
Diffusion models have become prevalent in generative modeling due to their ability to sample from complex distributions.<n>Two commonly used methods are: (i) alignment, which involves fine-tuning a diffusion model to align it with a reward; and (ii) composition, which combines several pre-trained diffusion models, each emphasizing a desirable attribute in the generated outputs.<n>We propose a constrained optimization framework that unifies alignment and composition of diffusion models by enforcing that the aligned model satisfies reward constraints and/or remains close to (potentially multiple) pre-trained models.
arXiv Detail & Related papers (2025-08-26T15:06:30Z)
Hybrid Autoregressive-Diffusion Model for Real-Time Sign Language Production [0.0]
We develop a hybrid approach that combines autoregressive and diffusion models for Sign Language Production (SLP)<n>To capture fine-grained body movements, we design a Multi-Scale Pose Representation module that separately extracts detailed features from distinct articulators.<n>We introduce a Confidence-Aware Causal Attention mechanism that utilizes joint-level confidence scores to dynamically guide the pose generation process.
arXiv Detail & Related papers (2025-07-12T01:34:50Z)
Discrete Diffusion Models for Language Generation [0.0]
This thesis investigates the feasibility and performance of discrete diffusion models for natural language generation.<n>We use Bits Per Token (BPT), Negative Log-Likelihood (NLL), Perplexity (PPL), and Batch Processing Speed to assess generative performance.<n>The AR model outperforms in compression with a lower mean BPT of 4.59, but D3PM achieves higher processing speed, reaching up to 3.97 batches per sec.
arXiv Detail & Related papers (2025-07-02T23:43:02Z)
Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective [54.77404771454794]
We develop a flexible and robust world model for Multi-Agent Reinforcement Learning (MARL) using diffusion models.<n>Our method, Diffusion-Inspired Multi-Agent world model (DIMA), achieves state-of-the-art performance across multiple multi-agent control benchmarks.
arXiv Detail & Related papers (2025-05-27T09:11:38Z)
Preference-Based Alignment of Discrete Diffusion Models [14.874943508610857]
We introduce Discrete Diffusion DPO (D2-DPO), the first adaptation of Direct Preference Optimization (DPO) to discrete diffusion models formulated as continuous-time Markov chains.<n>Our approach derives a novel loss function that directly fine-tunes the generative process using preference data while preserving fidelity to a reference distribution.<n>Our results highlight that D2-DPO enables controlled fine-tuning without requiring explicit reward models, making it a practical alternative to reinforcement learning-based approaches.
arXiv Detail & Related papers (2025-03-11T11:07:35Z)
Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step.<n>Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z)
Scaling Diffusion Language Models via Adaptation from Autoregressive Models [105.70889434492143]
Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling.<n>We show that we can convert AR models ranging from 127M to 7B parameters into diffusion models DiffuGPT and DiffuLLaMA, using less than 200B tokens for training.<n>Our experimental results reveal that these models outperform earlier DLMs and are competitive with their AR counterparts.
arXiv Detail & Related papers (2024-10-23T14:04:22Z)
EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z)
Solving Inverse Problems with Model Mismatch using Untrained Neural Networks within Model-based Architectures [14.551812310439004]
We introduce an untrained forward model residual block within the model-based architecture to match the data consistency in the measurement domain for each instance. Our approach offers a unified solution that is less parameter-sensitive, requires no additional data, and enables simultaneous fitting of the forward model and reconstruction in a single pass.
arXiv Detail & Related papers (2024-03-07T19:02:13Z)
DiffusER: Discrete Diffusion via Edit-based Reconstruction [88.62707047517914]
DiffusER is an edit-based generative model for text based on denoising diffusion models. It can rival autoregressive models on several tasks spanning machine translation, summarization, and style transfer. It can also perform other varieties of generation that standard autoregressive models are not well-suited for.
arXiv Detail & Related papers (2022-10-30T16:55:23Z)
Self-Reflective Variational Autoencoder [21.054722609128525]
Variational Autoencoder (VAE) is a powerful framework for learning latent variable generative models. We introduce a solution, which we call self-reflective inference. We empirically demonstrate the clear advantages of matching the variational posterior to the exact posterior.
arXiv Detail & Related papers (2020-07-10T05:05:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.