Inference-Time Scaling for Visual AutoRegressive modeling by Searching Representative Samples
- URL: http://arxiv.org/abs/2601.07293v1
- Date: Mon, 12 Jan 2026 08:00:55 GMT
- Title: Inference-Time Scaling for Visual AutoRegressive modeling by Searching Representative Samples
- Authors: Weidong Tang, Xinyan Wan, Siyu Li, Xiumei Wang,
- Abstract summary: We introduce VAR-Scaling, the first general framework for inference-time scaling in visual autoregressive modeling (VQ)<n>We map sampling spaces to quasi-continuous feature spaces via kernel density estimation (KDE), where high-density samples approximate stable, high-quality solutions.<n> Experiments in class-conditional and text-to-image evaluations demonstrate significant improvements in inference process.
- Score: 8.364449021192016
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While inference-time scaling has significantly enhanced generative quality in large language and diffusion models, its application to vector-quantized (VQ) visual autoregressive modeling (VAR) remains unexplored. We introduce VAR-Scaling, the first general framework for inference-time scaling in VAR, addressing the critical challenge of discrete latent spaces that prohibit continuous path search. We find that VAR scales exhibit two distinct pattern types: general patterns and specific patterns, where later-stage specific patterns conditionally optimize early-stage general patterns. To overcome the discrete latent space barrier in VQ models, we map sampling spaces to quasi-continuous feature spaces via kernel density estimation (KDE), where high-density samples approximate stable, high-quality solutions. This transformation enables effective navigation of sampling distributions. We propose a density-adaptive hybrid sampling strategy: Top-k sampling focuses on high-density regions to preserve quality near distribution modes, while Random-k sampling explores low-density areas to maintain diversity and prevent premature convergence. Consequently, VAR-Scaling optimizes sample fidelity at critical scales to enhance output quality. Experiments in class-conditional and text-to-image evaluations demonstrate significant improvements in inference process. The code is available at https://github.com/WD7ang/VAR-Scaling.
Related papers
- Synergizing Transport-Based Generative Models and Latent Geometry for Stochastic Closure Modeling [1.665466637453776]
We show that flow matching in a lower-dimensional latent space is suited for fast sampling of closure models.<n>We control the latent space distortion and thus ensure the physical fidelity of the sampled closure term.
arXiv Detail & Related papers (2026-02-19T05:24:00Z) - Formalizing the Sampling Design Space of Diffusion-Based Generative Models via Adaptive Solvers and Wasserstein-Bounded Timesteps [4.397130429878499]
Diffusion-based generative models have achieved remarkable performance across various domains, yet their practical deployment is often limited by high sampling costs.<n>We propose SDM, a principled framework that aligns the numerical solver with the intrinsic properties of the diffusion trajectory.<n>By analyzing the ODE dynamics, we show that efficient low-order solvers suffice in early high-noise stages while higher-order solvers can be progressively deployed to handle the increasing non-linearity of later stages.
arXiv Detail & Related papers (2026-02-13T05:02:07Z) - Understanding Sampler Stochasticity in Training Diffusion Models for RLHF [11.537564997052606]
This paper theoretically characterizes the reward gap and provides non-vacuous bounds for general diffusion models.<n> Empirically, our findings through large-scale experiments on text-to-image models validate that reward gaps consistently narrow over training.
arXiv Detail & Related papers (2025-10-12T19:08:38Z) - G$^2$RPO: Granular GRPO for Precise Reward in Flow Models [74.21206048155669]
We propose a novel Granular-GRPO (G$2$RPO) framework that achieves precise and comprehensive reward assessments of sampling directions.<n>We introduce a Multi-Granularity Advantage Integration module that aggregates advantages computed at multiple diffusion scales.<n>Our G$2$RPO significantly outperforms existing flow-based GRPO baselines.
arXiv Detail & Related papers (2025-10-02T12:57:12Z) - Advancing Reliable Test-Time Adaptation of Vision-Language Models under Visual Variations [67.35596444651037]
Vision-language models (VLMs) exhibit remarkable zero-shot capabilities but struggle with distribution shifts in downstream tasks when labeled data is unavailable.<n>We propose a Reliable Test-time Adaptation (ReTA) method that enhances reliability from two perspectives.
arXiv Detail & Related papers (2025-07-13T05:37:33Z) - Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling [70.8832906871441]
We study how to steer generation toward desired rewards without retraining the models.<n>Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement.<n>We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity.
arXiv Detail & Related papers (2025-07-11T08:00:47Z) - Ctrl-Z Sampling: Diffusion Sampling with Controlled Random Zigzag Explorations [17.357140159249496]
We propose a novel sampling strategy that adaptively detects and escapes steep local maxima.<n>We show that Ctrl-Z Sampling substantially improves generation quality while requiring only about 7.72 times the NFEs of the original.
arXiv Detail & Related papers (2025-06-25T10:01:00Z) - Diffusion Sampling Path Tells More: An Efficient Plug-and-Play Strategy for Sample Filtering [18.543769006014383]
Diffusion models often exhibit inconsistent sample quality due to variations inherent in their sampling trajectories.<n>We introduce CFG-Rejection, an efficient, plug-and-play strategy that filters low-quality samples at an early stage of the denoising process.<n>We validate the effectiveness of CFG-Rejection in image generation through extensive experiments.
arXiv Detail & Related papers (2025-05-29T11:08:24Z) - Arbitrary-steps Image Super-resolution via Diffusion Inversion [68.78628844966019]
This study presents a new image super-resolution (SR) technique based on diffusion inversion, aiming at harnessing the rich image priors encapsulated in large pre-trained diffusion models to improve SR performance.<n>We design a Partial noise Prediction strategy to construct an intermediate state of the diffusion model, which serves as the starting sampling point.<n>Once trained, this noise predictor can be used to initialize the sampling process partially along the diffusion trajectory, generating the desirable high-resolution result.
arXiv Detail & Related papers (2024-12-12T07:24:13Z) - DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for
Accelerated Seq2Seq Diffusion Models [58.450152413700586]
We introduce a soft absorbing state that facilitates the diffusion model in learning to reconstruct discrete mutations based on the underlying Gaussian space.
We employ state-of-the-art ODE solvers within the continuous space to expedite the sampling process.
Our proposed method effectively accelerates the training convergence by 4x and generates samples of similar quality 800x faster.
arXiv Detail & Related papers (2023-10-09T15:29:10Z) - Oops I Took A Gradient: Scalable Sampling for Discrete Distributions [53.3142984019796]
We show that this approach outperforms generic samplers in a number of difficult settings.
We also demonstrate the use of our improved sampler for training deep energy-based models on high dimensional discrete data.
arXiv Detail & Related papers (2021-02-08T20:08:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.