Lookahead Sample Reward Guidance for Test-Time Scaling of Diffusion Models
- URL: http://arxiv.org/abs/2602.03211v1
- Date: Tue, 03 Feb 2026 07:27:27 GMT
- Title: Lookahead Sample Reward Guidance for Test-Time Scaling of Diffusion Models
- Authors: Yeongmin Kim, Donghyeok Shin, Byeonghu Na, Minsang Park, Richard Lee Kim, Il-Chul Moon,
- Abstract summary: Diffusion models have demonstrated strong generative performance; however, generated samples often fail to fully align with human intent.<n>This paper studies a test-time scaling method that enables sampling from regions with higher human-aligned reward values.
- Score: 28.29554194279748
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have demonstrated strong generative performance; however, generated samples often fail to fully align with human intent. This paper studies a test-time scaling method that enables sampling from regions with higher human-aligned reward values. Existing gradient guidance methods approximate the expected future reward (EFR) at an intermediate particle $\mathbf{x}_t$ using a Taylor approximation, but this approximation at each time step incurs high computational cost due to sequential neural backpropagation. We show that the EFR at any $\mathbf{x}_t$ can be computed using only marginal samples from a pre-trained diffusion model. The proposed EFR formulation detaches the neural dependency between $\mathbf{x}_t$ and the EFR, enabling closed-form guidance computation without neural backpropagation. To further improve efficiency, we introduce lookahead sampling to collect marginal samples. For final sample generation, we use an accurate solver that guides particles toward high-reward lookahead samples. We refer to this sampling scheme as LiDAR sampling. LiDAR achieves substantial performance improvements using only three samples with a 3-step lookahead solver, exhibiting steep performance gains as lookahead accuracy and sample count increase; notably, it reaches the same GenEval performance as the latest gradient guidance method for SDXL with a 9.5x speedup.
Related papers
- Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling [70.8832906871441]
We study how to steer generation toward desired rewards without retraining the models.<n>Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement.<n>We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity.
arXiv Detail & Related papers (2025-07-11T08:00:47Z) - Diffusion Tree Sampling: Scalable inference-time alignment of diffusion models [13.312007032203857]
Adapting a pretrained diffusion model to new objectives at inference time remains an open problem in generative modeling.<n>We introduce a tree-based approach that samples from the reward-aligned target density by propagating terminal rewards back through the diffusion chain.<n>By reusing information from previous generations, we get an anytime algorithm that turns additional compute into steadily better samples.
arXiv Detail & Related papers (2025-06-25T17:59:10Z) - Progressive Tempering Sampler with Diffusion [50.06039228068602]
We propose a neural sampler that trains diffusion models sequentially across temperatures.<n>We also introduce a novel method to combine high-temperature diffusion models to generate approximate lower-temperature samples.<n>Our method significantly improves target evaluation efficiency, outperforming diffusion-based neural samplers.
arXiv Detail & Related papers (2025-06-05T16:46:04Z) - Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts [64.34482582690927]
We provide an efficient and principled method for sampling from a sequence of annealed, geometric-averaged, or product distributions derived from pretrained score-based models.<n>We propose Sequential Monte Carlo (SMC) resampling algorithms that leverage inference-time scaling to improve sampling quality.
arXiv Detail & Related papers (2025-03-04T17:46:51Z) - Distributional Diffusion Models with Scoring Rules [83.38210785728994]
Diffusion models generate high-quality synthetic data.<n> generating high-quality outputs requires many discretization steps.<n>We propose to accomplish sample generation by learning the posterior em distribution of clean data samples.
arXiv Detail & Related papers (2025-02-04T16:59:03Z) - Self-Refining Diffusion Samplers: Enabling Parallelization via Parareal Iterations [53.180374639531145]
Self-Refining Diffusion Samplers (SRDS) retain sample quality and can improve latency at the cost of additional parallel compute.<n>We take inspiration from the Parareal algorithm, a popular numerical method for parallel-in-time integration of differential equations.
arXiv Detail & Related papers (2024-12-11T11:08:09Z) - Diffusion Rejection Sampling [13.945372555871414]
Diffusion Rejection Sampling (DiffRS) is a rejection sampling scheme that aligns the sampling transition kernels with the true ones at each timestep.
The proposed method can be viewed as a mechanism that evaluates the quality of samples at each intermediate timestep and refines them with varying effort depending on the sample.
Empirical results demonstrate the state-of-the-art performance of DiffRS on the benchmark datasets and the effectiveness of DiffRS for fast diffusion samplers and large-scale text-to-image diffusion models.
arXiv Detail & Related papers (2024-05-28T07:00:28Z) - Entropy-based Training Methods for Scalable Neural Implicit Sampler [20.35664492719671]
In this paper, we introduce an efficient and scalable implicit neural sampler that overcomes limitations.<n>The implicit sampler can generate large batches of samples with low computational costs.<n>By employing the two training methods, we effectively optimize the neural implicit samplers to learn and generate from the desired target distribution.
arXiv Detail & Related papers (2023-06-08T05:56:05Z) - Interpreting and Improving Diffusion Models from an Optimization Perspective [4.5993996573872185]
We use this observation to interpret denoising diffusion models as approximate gradient descent applied to the Euclidean distance function.
We propose a new gradient-estimation sampler, generalizing DDIM using insights from our theoretical results.
arXiv Detail & Related papers (2023-06-08T00:56:33Z) - SIMPLE: A Gradient Estimator for $k$-Subset Sampling [42.38652558807518]
In this work, we fall back to discrete $k$-subset sampling on the forward pass.
We show that our gradient estimator, SIMPLE, exhibits lower bias and variance compared to state-of-the-art estimators.
Empirical results show improved performance on learning to explain and sparse linear regression.
arXiv Detail & Related papers (2022-10-04T22:33:16Z) - Adaptive Sketches for Robust Regression with Importance Sampling [64.75899469557272]
We introduce data structures for solving robust regression through gradient descent (SGD)
Our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data.
arXiv Detail & Related papers (2022-07-16T03:09:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.