Diversity Has Always Been There in Your Visual Autoregressive Models
- URL: http://arxiv.org/abs/2511.17074v1
- Date: Fri, 21 Nov 2025 09:24:09 GMT
- Title: Diversity Has Always Been There in Your Visual Autoregressive Models
- Authors: Tong Wang, Guanyu Yang, Nian Liu, Kai Wang, Yaxing Wang, Abdelrahman M Shaker, Salman Khan, Fahad Shahbaz Khan, Senmao Li,
- Abstract summary: Visual Autoregressive ( VAR) models have recently garnered significant attention for their innovative next-scale prediction paradigm.<n>Despite their efficiency, VAR models often suffer from the diversity collapse, analogous to that observed in few-step distilled diffusion models.<n>We introduce Diverse VAR, a simple yet effective approach that restores the generative diversity of VAR models without requiring any additional training.
- Score: 78.27363151940996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Autoregressive (VAR) models have recently garnered significant attention for their innovative next-scale prediction paradigm, offering notable advantages in both inference efficiency and image quality compared to traditional multi-step autoregressive (AR) and diffusion models. However, despite their efficiency, VAR models often suffer from the diversity collapse i.e., a reduction in output variability, analogous to that observed in few-step distilled diffusion models. In this paper, we introduce DiverseVAR, a simple yet effective approach that restores the generative diversity of VAR models without requiring any additional training. Our analysis reveals the pivotal component of the feature map as a key factor governing diversity formation at early scales. By suppressing the pivotal component in the model input and amplifying it in the model output, DiverseVAR effectively unlocks the inherent generative potential of VAR models while preserving high-fidelity synthesis. Empirical results demonstrate that our approach substantially enhances generative diversity with only neglectable performance influences. Our code will be publicly released at https://github.com/wangtong627/DiverseVAR.
Related papers
- DreamVAR: Taming Reinforced Visual Autoregressive Model for High-Fidelity Subject-Driven Image Generation [108.71044040025374]
We present a novel framework for subject-driven image synthesis built upon a Visual Autoregressive model that employs next-scale prediction.<n>We show that Dreamthe achieves superior appearance preservation compared to leading diffusion-based methods.
arXiv Detail & Related papers (2026-01-30T03:32:29Z) - Epistemic diversity across language models mitigates knowledge collapse [0.4941630596191806]
Inspired by ecology, we ask whether AI ecosystem diversity, that is, diversity among models, can mitigate such a collapse.<n>To study the effect of diversity on model performance, we segment the training data across language models and evaluate the resulting ecosystems over ten, self-training iterations.<n>Our results suggest that an ecosystem containing only a few diverse models fails to express the rich mixture of the full, true distribution, resulting in rapid performance decay.
arXiv Detail & Related papers (2025-12-17T02:03:28Z) - DiverseVAR: Balancing Diversity and Quality of Next-Scale Visual Autoregressive Models [23.12099227251494]
We introduce Diverse VAR, a framework that enhances the diversity of text-conditioned visual autoregressive models ( VAR) at test time.<n>Var models have emerged as strong competitors to diffusion and flow models for image generation.<n>Var models suffer from a critical limitation in diversity, often producing nearly identical images even for simple prompts.
arXiv Detail & Related papers (2025-11-26T14:06:52Z) - Your VAR Model is Secretly an Efficient and Explainable Generative Classifier [19.629406299980463]
We propose a novel generative model built on recent advances in visual autoregressive modeling.<n>We show that the VAR-based method fundamentally different properties from diffusion-based methods.<n>In particular, due to its tractable likelihood, the VAR-based classifier enables visual explainability via tokenwise mutual information.
arXiv Detail & Related papers (2025-10-14T01:59:01Z) - One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - Autoregressive Video Generation without Vector Quantization [90.87907377618747]
We reformulate the video generation problem as a non-quantized autoregressive modeling of temporal frame-by-frame prediction.<n>With the proposed approach, we train a novel video autoregressive model without vector quantization, termed NOVA.<n>Our results demonstrate that NOVA surpasses prior autoregressive video models in data efficiency, inference speed, visual fidelity, and video fluency, even with a much smaller model capacity.
arXiv Detail & Related papers (2024-12-18T18:59:53Z) - ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer [95.80384464922147]
ACDiT is a blockwise Conditional Diffusion Transformer.<n>It offers a flexible between token-wise autoregression and full-sequence diffusion.<n>We show that ACDiT performs best among all autoregressive baselines on image and video generation tasks.
arXiv Detail & Related papers (2024-12-10T18:13:20Z) - Ensembling Diffusion Models via Adaptive Feature Aggregation [14.663257610094625]
Leveraging multiple high-quality models to produce stronger generation ability is valuable, but has not been extensively studied.<n>Existing methods primarily adopt parameter merging strategies to produce a new static model.<n>We propose Adaptive Feature Aggregation (AFA), which dynamically adjusts the contributions of multiple models at the feature level according to various states.
arXiv Detail & Related papers (2024-05-27T11:55:35Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Incorporating Reinforced Adversarial Learning in Autoregressive Image
Generation [39.55651747758391]
We propose to use Reinforced Adversarial Learning (RAL) based on policy gradient optimization for autoregressive models.
RAL also empowers the collaboration between different modules of the VQ-VAE framework.
The proposed method achieves state-of-the-art results on Celeba for 64 $times$ 64 image resolution.
arXiv Detail & Related papers (2020-07-20T08:10:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.