Related papers: Fast-ARDiff: An Entropy-informed Acceleration Framework for Continuous Space Autoregressive Generation

Fast-ARDiff: An Entropy-informed Acceleration Framework for Continuous Space Autoregressive Generation

URL: http://arxiv.org/abs/2512.08537v1
Date: Tue, 09 Dec 2025 12:35:18 GMT
Title: Fast-ARDiff: An Entropy-informed Acceleration Framework for Continuous Space Autoregressive Generation
Authors: Zhen Zou, Xiaoxiao Ma, Jie Huang, Zichao Yu, Feng Zhao,
Abstract summary: Autoregressive(AR)-diffusion hybrid paradigms combine AR's structured modeling with diffusion's synthesis.<n>We propose a unified AR-diffusion framework Fast-ARDiff that jointly optimize both components.<n>Fast-ARDiff achieves state-of-the-art acceleration across diverse models.
Score: 12.384836052394272
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autoregressive(AR)-diffusion hybrid paradigms combine AR's structured modeling with diffusion's photorealistic synthesis, yet suffer from high latency due to sequential AR generation and iterative denoising. In this work, we tackle this bottleneck and propose a unified AR-diffusion framework Fast-ARDiff that jointly optimizes both components, accelerating AR speculative decoding while simultaneously facilitating faster diffusion decoding. Specifically: (1) The entropy-informed speculative strategy encourages draft model to produce higher-entropy representations aligned with target model's entropy characteristics, mitigating entropy mismatch and high rejection rates caused by draft overconfidence. (2) For diffusion decoding, rather than treating it as an independent module, we integrate it into the same end-to-end framework using a dynamic scheduler that prioritizes AR optimization to guide the diffusion part in further steps. The diffusion part is optimized through a joint distillation framework combining trajectory and distribution matching, ensuring stable training and high-quality synthesis with extremely few steps. During inference, shallow feature entropy from AR module is used to pre-filter low-entropy drafts, avoiding redundant computation and improving latency. Fast-ARDiff achieves state-of-the-art acceleration across diverse models: on ImageNet 256$\times$256, TransDiff attains 4.3$\times$ lossless speedup, and NextStep-1 achieves 3$\times$ acceleration on text-conditioned generation. Code will be available at https://github.com/aSleepyTree/Fast-ARDiff.

Related papers

D$^2$-VR: Degradation-Robust and Distilled Video Restoration with Synergistic Optimization Strategy [7.553742541566094]
integration of diffusion priors with temporal alignment has emerged as a transformative paradigm for video restoration, delivering fantastic perceptual quality.<n>We propose textbfD$2$-VR, a single-image diffusion-based video-restoration framework with low-step inference.
arXiv Detail & Related papers (2026-02-09T08:52:51Z)
DiffBench Meets DiffAgent: End-to-End LLM-Driven Diffusion Acceleration Code Generation [25.165655684862074]
We introduce a framework driven by large language models (LLMs) for automated acceleration code generation and evaluation.<n>First, we present DiffBench, a comprehensive benchmark that implements a three stage automated evaluation pipeline.<n>Second, we propose DiffAgent, an agent that generates optimal acceleration strategies and codes for arbitrary diffusion models.
arXiv Detail & Related papers (2026-01-06T16:55:55Z)
Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching [16.927491376135134]
We present Fast-FoundationStereo, a family of architectures that achieve, for the first time, strong zero-shot generalization at real-time frame rate.<n>We employ a divide-and-conquer acceleration strategy with three components: knowledge distillation, blockwise neural architecture search and structured pruning.<n>The resulting model can run over 10x faster than FoundationStereo while closely matching its zero-shot accuracy.
arXiv Detail & Related papers (2025-12-11T21:36:29Z)
VVS: Accelerating Speculative Decoding for Visual Autoregressive Generation via Partial Verification Skipping [52.58270801983525]
speculative decoding (SD) has been proven effective for accelerating visual AR models.<n>We propose a novel framework VVS to accelerate visual AR generation via partial verification skipping.
arXiv Detail & Related papers (2025-11-17T16:50:58Z)
Rethinking Autoregressive Models for Lossless Image Compression via Hierarchical Parallelism and Progressive Adaptation [75.58269386927076]
Autoregressive (AR) models are often dismissed as impractical due to prohibitive computational cost.<n>This work re-thinks this paradigm, introducing a framework built on hierarchical parallelism and progressive adaptation.<n> Experiments on diverse datasets (natural, satellite, medical) validate that our method achieves new state-of-the-art compression.
arXiv Detail & Related papers (2025-11-14T06:27:58Z)
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation [62.14510717860079]
We propose a Synergistic Diffusion-Autoregression paradigm that unifies the training efficiency of autoregressive models with the parallel inference capability of diffusion.<n>SDAR performs a lightweight paradigm conversion that transforms a well-trained autoregressive (AR) model into a blockwise diffusion model through brief, data-efficient adaptation.<n>Building on this insight, SDAR achieves efficient AR-to-diffusion conversion with minimal cost, preserving AR-level performance while enabling parallel generation.
arXiv Detail & Related papers (2025-10-07T17:29:28Z)
RAPID^3: Tri-Level Reinforced Acceleration Policies for Diffusion Transformer [86.57077884971478]
Diffusion Transformers (DiTs) excel at visual generation yet remain hampered by slow sampling.<n>We introduce RAPID3: Tri-Level Reinforced Acceleration Policies for Diffusion Transformers.<n>It delivers image-wise acceleration with zero updates to the base generator.<n>It achieves nearly 3x faster sampling with competitive generation quality.
arXiv Detail & Related papers (2025-09-26T13:20:52Z)
Improving Progressive Generation with Decomposable Flow Matching [50.63174319509629]
Decomposable Flow Matching (DFM) is a simple and effective framework for the progressive generation of visual media.<n>On Imagenet-1k 512px, DFM achieves 35.2% improvements in FDD scores over the base architecture and 26.4% over the best-performing baseline.
arXiv Detail & Related papers (2025-06-24T17:58:02Z)
DDT: Decoupled Diffusion Transformer [51.84206763079382]
Diffusion transformers encode noisy inputs to extract semantic component and decode higher frequency with identical modules.<n>textbfcolorddtDecoupled textbfcolorddtTransformer(textbfcolorddtDDT)<n>textbfcolorddtTransformer(textbfcolorddtDDT)<n>textbfcolorddtTransformer(textbfcolorddtDDT)
arXiv Detail & Related papers (2025-04-08T07:17:45Z)
High-Dimensional Sparse Data Low-rank Representation via Accelerated Asynchronous Parallel Stochastic Gradient Descent [2.2083091880368855]
Low-rank representation can map high-dimensional sparse (HDS) data to low-dimensional feature spaces. Existing optimization algorithms for LR models are computationally inefficient and slowly convergent on large-scale datasets. A2PSGD outperforms existing optimization algorithms for HDS data LR in both accuracy and training time.
arXiv Detail & Related papers (2024-08-29T14:55:33Z)
FastRE: Towards Fast Relation Extraction with Convolutional Encoder and Improved Cascade Binary Tagging Framework [13.4666880421568]
We propose a fast relation extraction model (FastRE) based on convolutional encoder and improved cascade binary tagging framework. FastRE achieves 3-10x training speed, 7-15x inference speed faster, and 1/100 parameters compared to the state-of-the-art models.
arXiv Detail & Related papers (2022-05-05T07:59:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.