ToProVAR: Efficient Visual Autoregressive Modeling via Tri-Dimensional Entropy-Aware Semantic Analysis and Sparsity Optimization
- URL: http://arxiv.org/abs/2602.22948v2
- Date: Mon, 02 Mar 2026 16:58:25 GMT
- Title: ToProVAR: Efficient Visual Autoregressive Modeling via Tri-Dimensional Entropy-Aware Semantic Analysis and Sparsity Optimization
- Authors: Jiayu Chen, Ruoyu Lin, Zihao Zheng, Jingxin Li, Maoliang Li, Guojie Luo, Xiang Chen,
- Abstract summary: Visual Autoregressive( VAR) models enhance generation quality but face a critical efficiency bottleneck in later stages.<n>We present a novel optimization framework for VAR models that fundamentally differs from prior approaches.<n>Our approach achieves aggressive acceleration of the generation process while significantly preserving semantic fidelity and fine details.
- Score: 13.916180996567128
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Autoregressive(VAR) models enhance generation quality but face a critical efficiency bottleneck in later stages. In this paper, we present a novel optimization framework for VAR models that fundamentally differs from prior approaches such as FastVAR and SkipVAR. Instead of relying on heuristic skipping strategies, our method leverages attention entropy to characterize the semantic projections across different dimensions of the model architecture. This enables precise identification of parameter dynamics under varying token granularity levels, semantic scopes, and generation scales. Building on this analysis, we further uncover sparsity patterns along three critical dimensions-token, layer, and scale-and propose a set of fine-grained optimization strategies tailored to these patterns. Extensive evaluation demonstrates that our approach achieves aggressive acceleration of the generation process while significantly preserving semantic fidelity and fine details, outperforming traditional methods in both efficiency and quality. Experiments on Infinity-2B and Infinity-8B models demonstrate that ToProVAR achieves up to 3.4x acceleration with minimal quality loss, effectively mitigating the issues found in prior work. Our code will be made publicly available.
Related papers
- StepVAR: Structure-Texture Guided Pruning for Visual Autoregressive Models [98.72926158261937]
We propose a training-free token pruning framework for Visual AutoRegressive models.<n>We employ a lightweight high-pass filter to capture local texture details, while leveraging Principal Component Analysis (PCA) to preserve global structural information.<n>To maintain valid next-scale prediction under sparse tokens, we introduce a nearest neighbor feature propagation strategy.
arXiv Detail & Related papers (2026-03-02T11:35:05Z) - Visual Autoregressive Modelling for Monocular Depth Estimation [69.01449528371916]
We propose a monocular depth estimation method based on visual autoregressive ( VAR) priors.<n>Our method adapts a large-scale text-to-image VAR model and introduces a scale-wise conditional upsampling mechanism.<n>We report state-of-the-art performance in indoor benchmarks under constrained training conditions, and strong performance when applied to outdoor datasets.
arXiv Detail & Related papers (2025-12-27T17:08:03Z) - SparseWorld-TC: Trajectory-Conditioned Sparse Occupancy World Model [27.54931639768958]
This paper introduces a novel architecture for trajectory-conditioned forecasting of future 3D scene occupancy.<n>Inspired by attention-based transformer architectures in foundational vision and language models such as GPT and VGGT, we employ a sparse occupancy representation that bypasses the intermediate bird's eye view (BEV) projection and its explicit geometric priors.<n>By avoiding both the finite-capacity constraint of discrete tokenization and the structural limitations of BEV representations, our method achieves state-of-the-art performance on the nuScenes benchmark for 1-3 second occupancy forecasting.
arXiv Detail & Related papers (2025-11-27T02:48:45Z) - Automated Modeling Method for Pathloss Model Discovery [1.7373039830910548]
This paper proposes a novel approach that accelerates the discovery of path loss models while maintaining interpretability.<n>We examine two techniques: one based on Deep Symbolic Regression, offering full interpretability, and the second based on Kolmogorov-Arnold Networks, providing two levels of interpretability.<n>Our results show that Kolmogorov-Arnold Networks achieve the coefficient of determination value R2 close to 1 with minimal prediction error, while Deep Symbolic Regression generates compact models with moderate accuracy.
arXiv Detail & Related papers (2025-05-29T12:04:07Z) - Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step [86.69947123512836]
Chain-of-Thought (CoT) reasoning has been extensively explored in large models to tackle complex understanding tasks.<n>We provide the first comprehensive investigation of the potential of CoT reasoning to enhance autoregressive image generation.<n>We propose the Potential Assessment Reward Model (PARM) and PARM++, specialized for autoregressive image generation.
arXiv Detail & Related papers (2025-01-23T18:59:43Z) - An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training [50.71892161377806]
DFIT-OccWorld is an efficient 3D occupancy world model that leverages decoupled dynamic flow and image-assisted training strategy.<n>Our model forecasts future dynamic voxels by warping existing observations using voxel flow, whereas static voxels are easily obtained through pose transformation.
arXiv Detail & Related papers (2024-12-18T12:10:33Z) - SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.<n>We propose a novel model fine-tuning method to make full use of these ineffective parameters.<n>Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z) - A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature Inheritance Strategies [51.7643024367548]
Stable Diffusion Model is a prevalent and effective model for text-to-image (T2I) and image-to-image (I2I) generation.
This study focuses on reducing redundant computation in SDM and optimizing the model through both tuning and tuning-free methods.
arXiv Detail & Related papers (2024-05-31T21:47:05Z) - Break a Lag: Triple Exponential Moving Average for Enhanced Optimization [2.0199251985015434]
We introduce Fast Adaptive Moment Estimation (FAME), a novel optimization technique that leverages the power of Triple Exponential Moving Average.<n>FAME enhances responsiveness to data dynamics, mitigates trend identification lag, and optimize learning efficiency.<n>Our comprehensive evaluation encompasses different computer vision tasks including image classification, object detection, and semantic segmentation, integrating FAME into 30 distinct architectures.
arXiv Detail & Related papers (2023-06-02T10:29:33Z) - Aligning Optimization Trajectories with Diffusion Models for Constrained
Design Generation [17.164961143132473]
We introduce a learning framework that demonstrates the efficacy of aligning the sampling trajectory of diffusion models with the optimization trajectory derived from traditional physics-based methods.
Our method allows for generating feasible and high-performance designs in as few as two steps without the need for expensive preprocessing, external surrogate models, or additional labeled data.
Our results demonstrate that TA outperforms state-of-the-art deep generative models on in-distribution configurations and halves the inference computational cost.
arXiv Detail & Related papers (2023-05-29T09:16:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.