TiDAR: Think in Diffusion, Talk in Autoregression
- URL: http://arxiv.org/abs/2511.08923v1
- Date: Thu, 13 Nov 2025 01:18:11 GMT
- Title: TiDAR: Think in Diffusion, Talk in Autoregression
- Authors: Jingyu Liu, Xin Dong, Zhifan Ye, Rishabh Mehta, Yonggan Fu, Vartika Singh, Jan Kautz, Ce Zhang, Pavlo Molchanov,
- Abstract summary: TiDAR is a sequence-level hybrid architecture that drafts tokens (Thinking) in Diffusion and samples final outputs (Talking) AutoRegressively.<n> TiDAR is the first architecture to close the quality gap with AR models while delivering 4.71x to 5.91x more tokens per second.
- Score: 59.94106070312094
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion language models hold the promise of fast parallel generation, while autoregressive (AR) models typically excel in quality due to their causal structure aligning naturally with language modeling. This raises a fundamental question: can we achieve a synergy with high throughput, higher GPU utilization, and AR level quality? Existing methods fail to effectively balance these two aspects, either prioritizing AR using a weaker model for sequential drafting (speculative decoding), leading to lower drafting efficiency, or using some form of left-to-right (AR-like) decoding logic for diffusion, which still suffers from quality degradation and forfeits its potential parallelizability. We introduce TiDAR, a sequence-level hybrid architecture that drafts tokens (Thinking) in Diffusion and samples final outputs (Talking) AutoRegressively - all within a single forward pass using specially designed structured attention masks. This design exploits the free GPU compute density, achieving a strong balance between drafting and verification capacity. Moreover, TiDAR is designed to be serving-friendly (low overhead) as a standalone model. We extensively evaluate TiDAR against AR models, speculative decoding, and diffusion variants across generative and likelihood tasks at 1.5B and 8B scales. Thanks to the parallel drafting and sampling as well as exact KV cache support, TiDAR outperforms speculative decoding in measured throughput and surpasses diffusion models like Dream and Llada in both efficiency and quality. Most notably, TiDAR is the first architecture to close the quality gap with AR models while delivering 4.71x to 5.91x more tokens per second.
Related papers
- DFlash: Block Diffusion for Flash Speculative Decoding [11.98141750480807]
Autoregressive large language models (LLMs) deliver strong performance but require inherently sequential decoding.<n>We introduce DFlash, a speculative decoding framework that employs a lightweight block diffusion model for parallel drafting.
arXiv Detail & Related papers (2026-02-05T18:59:30Z) - MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and Generation [20.14002849273559]
Unified multimodal models aim to integrate understanding and generation within a single framework.<n>We present MammothModa2 (Mammoth2), a unified autoregressive-diffusion (AR-Diffusion) framework.<n>Mammoth2 delivers strong text-to-image and instruction-based editing performance on public benchmarks.
arXiv Detail & Related papers (2025-11-23T03:25:39Z) - Hawk: Leveraging Spatial Context for Faster Autoregressive Text-to-Image Generation [87.00172597953228]
Speculative decoding has shown promise in accelerating text generation without compromising quality.<n>We introduce Hawk, a new approach that harnesses the spatial structure of images to guide the speculative model toward more accurate and efficient predictions.<n> Experimental results on multiple text-to-image benchmarks demonstrate a 1.71x speedup over standard AR models.
arXiv Detail & Related papers (2025-10-29T17:43:31Z) - SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation [62.14510717860079]
We propose a Synergistic Diffusion-Autoregression paradigm that unifies the training efficiency of autoregressive models with the parallel inference capability of diffusion.<n>SDAR performs a lightweight paradigm conversion that transforms a well-trained autoregressive (AR) model into a blockwise diffusion model through brief, data-efficient adaptation.<n>Building on this insight, SDAR achieves efficient AR-to-diffusion conversion with minimal cost, preserving AR-level performance while enabling parallel generation.
arXiv Detail & Related papers (2025-10-07T17:29:28Z) - Next Tokens Denoising for Speech Synthesis [51.320443764269726]
Dragon-FM is a novel text-to-speech (TTS) design that unifies AR and flow-matching.<n>It processes 48 kHz audio tokens in chunks at a compact rate of 12.5 tokens per second.<n>Experiments on podcast datasets demonstrate its capability to efficiently generate high-quality zero-shot podcasts.
arXiv Detail & Related papers (2025-07-30T15:03:36Z) - Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling [80.30976039119236]
Lumina-mGPT 2.0 is a stand-alone, decoder-only autoregressive model.<n>It is trained entirely from scratch, enabling unrestricted architectural design and licensing freedom.<n>It achieves generation quality on par with state-of-the-art diffusion models.
arXiv Detail & Related papers (2025-07-23T17:42:13Z) - Anchored Diffusion Language Model [39.17770765212062]
We introduce the Anchored Diffusion Language Model (ADLM), a novel framework that predicts distributions over important tokens via an anchor network.<n>ADLM significantly improves test perplexity on LM1B and OpenWebText, achieving up to 25.4% gains over prior DLMs.<n>It also surpasses AR models in MAUVE score, which marks the first time a DLM generates better human-like text than an AR model.
arXiv Detail & Related papers (2025-05-24T01:34:14Z) - Complexity Matters: Rethinking the Latent Space for Generative Modeling [65.64763873078114]
In generative modeling, numerous successful approaches leverage a low-dimensional latent space, e.g., Stable Diffusion.
In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity.
arXiv Detail & Related papers (2023-07-17T07:12:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.