Related papers: Guiding Visual Autoregressive Models through Spectrum Weakening

Guiding Visual Autoregressive Models through Spectrum Weakening

URL: http://arxiv.org/abs/2511.22991v1
Date: Fri, 28 Nov 2025 08:52:50 GMT
Title: Guiding Visual Autoregressive Models through Spectrum Weakening
Authors: Chaoyang Wang, Tianmeng Yang, Jingdong Wang, Yunhai Tong,
Abstract summary: We propose a spectrum-weakening framework for visual autoregressive (AR) models.<n>It achieves this by constructing a controllable weak model in the spectral domain.<n>Our method enables high-quality unconditional generation while maintaining strong prompt alignment for conditional generation.
Score: 44.26047250249648
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Classifier-free guidance (CFG) has become a widely adopted and practical approach for enhancing generation quality and improving condition alignment. Recent studies have explored guidance mechanisms for unconditional generation, yet these approaches remain fundamentally tied to assumptions specific to diffusion models. In this work, we propose a spectrum-weakening framework for visual autoregressive (AR) models. This method works without the need for re-training, specific conditions, or any architectural modifications. It achieves this by constructing a controllable weak model in the spectral domain. We theoretically show that invertible spectral transformations preserve information, while selectively retaining only a subset of spectrum introduces controlled information reduction. Based on this insight, we perform spectrum selection along the channel dimension of internal representations, which avoids the structural constraints imposed by diffusion models. We further introduce two spectrum renormalization strategies that ensures numerical stability during the weakening process. Extensive experiments were conducted on both discrete and continuous AR models, with text or class conditioning. The results demonstrate that our method enables high-quality unconditional generation while maintaining strong prompt alignment for conditional generation.

Related papers

Spectral Regularization for Diffusion Models [14.919876123456747]
We propose a loss-level spectral regularization framework that augments standard diffusion training with differentiable Fourier- and wavelet-domain losses.<n>Our approach is compatible with DDPM, DDIM, and EDM formulations and introduces negligible computational overhead.
arXiv Detail & Related papers (2026-03-02T22:39:02Z)
Cross-Domain Transfer with Self-Supervised Spectral-Spatial Modeling for Hyperspectral Image Classification [5.784164305429653]
This paper proposes a self-supervised cross-domain transfer framework.<n>It learns transferable spectral-spatial joint representations without source labels.<n> Experimental results demonstrate stable classification performance and strong cross-domain adaptability.
arXiv Detail & Related papers (2026-01-26T02:52:35Z)
SIGMA: Scalable Spectral Insights for LLM Collapse [51.863164847253366]
We introduce SIGMA (Spectral Inequalities for Gram Matrix Analysis), a unified framework for model collapse.<n>By utilizing benchmarks that deriving and deterministic bounds on the matrix's spectrum, SIGMA provides a mathematically grounded metric to track the contraction of the representation space.<n>We demonstrate that SIGMA effectively captures the transition towards states, offering both theoretical insights into the mechanics of collapse.
arXiv Detail & Related papers (2026-01-06T19:47:11Z)
Structured Spectral Reasoning for Frequency-Adaptive Multimodal Recommendation [13.886659472425393]
Multimodal recommendation aims to integrate collaborative signals with heterogeneous content such as visual and textual information.<n>These issues are often exacerbated by naive fusion or shallow modeling strategies, leading to degraded generalization and poor robustness.<n>We propose a Structured Spectral Reasoning framework for frequency-aware multimodal recommendation.
arXiv Detail & Related papers (2025-12-01T07:39:28Z)
ScaleWeaver: Weaving Efficient Controllable T2I Generation with Multi-Scale Reference Attention [86.93601565563954]
ScaleWeaver is a framework designed to achieve high-fidelity, controllable generation upon advanced visual autoregressive( VAR) models.<n>The proposed Reference Attention module discards the unnecessary attention from image$rightarrow$condition, reducing computational cost.<n>Experiments show that ScaleWeaver delivers high-quality generation and precise control while attaining superior efficiency over diffusion-based methods.
arXiv Detail & Related papers (2025-10-16T17:00:59Z)
SpectrumFM: Redefining Spectrum Cognition via Foundation Modeling [65.65474629224558]
We propose a spectrum foundation model, termed SpectrumFM, which provides a new paradigm for spectrum cognition.<n>An innovative spectrum encoder that exploits the convolutional neural networks is proposed to effectively capture both fine-grained local signal structures and high-level global dependencies in the spectrum data.<n>Two novel self-supervised learning tasks, namely masked reconstruction and next-slot signal prediction, are developed for pre-training SpectrumFM, enabling the model to learn rich and transferable representations.
arXiv Detail & Related papers (2025-08-02T14:40:50Z)
RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation [10.956556608715035]
Text-to-image (T2I) diffusion models have shown remarkable success in generating high-quality images from text prompts.<n>We propose a flexible training-free framework that decouples the sampling schedule of condition features from the denoising process.<n>We further enhance the sampling process by introducing a restart refinement schedule, and improve the visual quality with an appearance-rich prompting strategy.
arXiv Detail & Related papers (2025-07-03T16:56:15Z)
FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z)
Constrained Discrete Diffusion [61.81569616239755]
This paper introduces Constrained Discrete Diffusion (CDD), a novel integration of differentiable constraint optimization within the diffusion process.<n>CDD directly imposes constraints into the discrete diffusion sampling process, resulting in a training-free and effective approach.
arXiv Detail & Related papers (2025-03-12T19:48:12Z)
Simple Guidance Mechanisms for Discrete Diffusion Models [44.377206440698586]
We develop a new class of diffusion models that leverage uniform noise and that are more guidable because they can continuously edit their outputs.<n>We improve the quality of these models with a novel continuous-time variational lower bound that yields state-of-the-art performance.
arXiv Detail & Related papers (2024-12-13T15:08:30Z)
Classification of High-dimensional Time Series in Spectral Domain using Explainable Features [8.656881800897661]
We propose a model-based approach for classifying high-dimensional stationary time series. Our approach emphasizes the interpretability of model parameters, making it especially suitable for fields like neuroscience. The novelty of our method lies in the interpretability of the model parameters, addressing critical needs in neuroscience.
arXiv Detail & Related papers (2024-08-15T19:10:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.