Related papers: StageVAR: Stage-Aware Acceleration for Visual Autoregressive Models

StageVAR: Stage-Aware Acceleration for Visual Autoregressive Models

URL: http://arxiv.org/abs/2512.16483v1
Date: Thu, 18 Dec 2025 12:51:19 GMT
Title: StageVAR: Stage-Aware Acceleration for Visual Autoregressive Models
Authors: Senmao Li, Kai Wang, Salman Khan, Fahad Shahbaz Khan, Jian Yang, Yaxing Wang,
Abstract summary: Visual Autoregressive ( VAR) modeling departs from the next-token prediction paradigm of traditional Autoregressive (AR) models through next-scale prediction.<n>Existing acceleration methods reduce runtime for large-scale steps, but rely on manual step selection and overlook the varying importance of different stages in the generation process.<n>We present Stage VAR, a systematic study and stage-aware acceleration framework for VAR models.
Score: 69.07782637329315
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual Autoregressive (VAR) modeling departs from the next-token prediction paradigm of traditional Autoregressive (AR) models through next-scale prediction, enabling high-quality image generation. However, the VAR paradigm suffers from sharply increased computational complexity and running time at large-scale steps. Although existing acceleration methods reduce runtime for large-scale steps, but rely on manual step selection and overlook the varying importance of different stages in the generation process. To address this challenge, we present StageVAR, a systematic study and stage-aware acceleration framework for VAR models. Our analysis shows that early steps are critical for preserving semantic and structural consistency and should remain intact, while later steps mainly refine details and can be pruned or approximated for acceleration. Building on these insights, StageVAR introduces a plug-and-play acceleration strategy that exploits semantic irrelevance and low-rank properties in late-stage computations, without requiring additional training. Our proposed StageVAR achieves up to 3.4x speedup with only a 0.01 drop on GenEval and a 0.26 decrease on DPG, consistently outperforming existing acceleration baselines. These results highlight stage-aware design as a powerful principle for efficient visual autoregressive image generation.

Related papers

Diff-ES: Stage-wise Structural Diffusion Pruning via Evolutionary Search [40.67449277026597]
We introduce textbfDiff-ES, a stage-wise structural textbfDiff-usion pruning framework via textbfEvolutionary textbfSearch.<n>Our framework naturally integrates with existing structured pruning methods for diffusion models including depth and width pruning.<n>Experiments on DiT and SDXL demonstrate that Diff-ES consistently achieves wall-clock speedups while incurring minimal degradation in generation quality.
arXiv Detail & Related papers (2026-03-05T12:18:40Z)
ToProVAR: Efficient Visual Autoregressive Modeling via Tri-Dimensional Entropy-Aware Semantic Analysis and Sparsity Optimization [13.916180996567128]
Visual Autoregressive( VAR) models enhance generation quality but face a critical efficiency bottleneck in later stages.<n>We present a novel optimization framework for VAR models that fundamentally differs from prior approaches.<n>Our approach achieves aggressive acceleration of the generation process while significantly preserving semantic fidelity and fine details.
arXiv Detail & Related papers (2026-02-26T12:36:56Z)
LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration [12.183601881545039]
Diffusion models have achieved remarkable success in image and video generation tasks.<n>However, the high computational demands of Diffusion Transformers pose a significant challenge to their practical deployment.<n>We propose a LEarnable Stage-Aware (LESA) predictor framework based on two-stage training.
arXiv Detail & Related papers (2026-02-24T02:53:28Z)
SparVAR: Exploring Sparsity in Visual AutoRegressive Modeling for Training-Free Acceleration [23.86429472943524]
We present a training-free acceleration framework that exploits three properties of Visual AutoRegressive attention: strong attention sinks, cross-scale activation similarity, and pronounced locality.<n>Specifically, we dynamically predict the sparse attention pattern of later high-resolution scales from a sparse decision scale, and construct scale self-similar sparse attention via an efficient index-mapping mechanism.<n>Our method achieves a $mathbf1.57times$ speed-up while preserving almost all high-frequency details.
arXiv Detail & Related papers (2026-02-04T09:34:06Z)
Adaptive Visual Autoregressive Acceleration via Dual-Linkage Entropy Analysis [50.48301331112126]
We propose NOVA, a training-free token reduction acceleration framework for Visual AutoRegressive modeling.<n>NOVA adaptively determines the acceleration activation scale during inference by online identifying the inflection point of scale entropy growth.<n>Experiments and analyses validate NOVA as a simple yet effective training-free acceleration framework.
arXiv Detail & Related papers (2026-02-01T17:29:42Z)
VVS: Accelerating Speculative Decoding for Visual Autoregressive Generation via Partial Verification Skipping [52.58270801983525]
speculative decoding (SD) has been proven effective for accelerating visual AR models.<n>We propose a novel framework VVS to accelerate visual AR generation via partial verification skipping.
arXiv Detail & Related papers (2025-11-17T16:50:58Z)
OmniSAT: Compact Action Token, Faster Auto Regression [70.70037017501357]
We introduce an Omni Swift Action Tokenizer, which learns a compact, transferable action representation.<n>The resulting discrete tokenization shortens the training sequence by 6.8$times$, and lowers the target entropy.
arXiv Detail & Related papers (2025-10-08T03:55:24Z)
SkipVAR: Accelerating Visual Autoregressive Modeling via Adaptive Frequency-Aware Skipping [30.85025293160079]
High-frequency components, or later steps, in the generation process contribute disproportionately to inference latency.<n>We identify two primary sources of inefficiency: step redundancy and unconditional branch redundancy.<n>We propose an automatic step-skipping strategy that selectively omits unnecessary generation steps to improve efficiency.
arXiv Detail & Related papers (2025-06-10T15:35:29Z)
RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration [51.77917733024544]
latent diffusion models (LDMs) have improved the perceptual quality of All-in-One image Restoration (AiOR) methods.<n>LDMs suffer from slow inference due to their iterative denoising process, rendering them impractical for time-sensitive applications.<n>Visual autoregressive modeling ( VAR) performs scale-space autoregression and achieves comparable performance to that of state-of-the-art diffusion transformers.
arXiv Detail & Related papers (2025-05-23T15:52:26Z)
Fast Autoregressive Models for Continuous Latent Generation [49.079819389916764]
Autoregressive models have demonstrated remarkable success in sequential data generation, particularly in NLP.<n>Recent work, the masked autoregressive model (MAR) bypasses quantization by modeling per-token distributions in continuous spaces using a diffusion head.<n>We propose Fast AutoRegressive model (FAR), a novel framework that replaces MAR's diffusion head with a lightweight shortcut head.
arXiv Detail & Related papers (2025-04-24T13:57:08Z)
Model-Agnostic AI Framework with Explicit Time Integration for Long-Term Fluid Dynamics Prediction [7.740582267221137]
We introduce the first implementation of the two-step derivative Adams-Bashforth method specifically tailored for data-driven AR prediction.<n>We develop three novel adaptive weighting strategies that dynamically adjust the importance of different future time steps.<n>Our framework accurately predicts 350 future steps reducing mean squared error from 0.125 to 0.002.
arXiv Detail & Related papers (2024-12-07T14:02:57Z)
A-SDM: Accelerating Stable Diffusion through Redundancy Removal and Performance Optimization [54.113083217869516]
In this work, we first explore the computational redundancy part of the network. We then prune the redundancy blocks of the model and maintain the network performance. Thirdly, we propose a global-regional interactive (GRI) attention to speed up the computationally intensive attention part.
arXiv Detail & Related papers (2023-12-24T15:37:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.