A General Framework for Inference-time Scaling and Steering of Diffusion Models
- URL: http://arxiv.org/abs/2501.06848v5
- Date: Fri, 18 Jul 2025 17:52:45 GMT
- Title: A General Framework for Inference-time Scaling and Steering of Diffusion Models
- Authors: Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, Rajesh Ranganath,
- Abstract summary: Feynman-Kac (FK) steering is an inference-time framework for steering diffusion models with reward functions.<n>FK steering works by sampling a system of multiple interacting diffusion processes, called particles, and resampling particles at intermediate steps.<n>We evaluate FK steering on text-to-image and text diffusion models.
- Score: 45.27503904127995
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models produce impressive results in modalities ranging from images and video to protein design and text. However, generating samples with user-specified properties remains a challenge. Recent research proposes fine-tuning models to maximize rewards that capture desired properties, but these methods require expensive training and are prone to mode collapse. In this work, we present Feynman-Kac (FK) steering, an inference-time framework for steering diffusion models with reward functions. FK steering works by sampling a system of multiple interacting diffusion processes, called particles, and resampling particles at intermediate steps based on scores computed using functions called potentials. Potentials are defined using rewards for intermediate states and are selected such that a high value indicates that the particle will yield a high-reward sample. We explore various choices of potentials, intermediate rewards, and samplers. We evaluate FK steering on text-to-image and text diffusion models. For steering text-to-image models with a human preference reward, we find that FK steering a 0.8B parameter model outperforms a 2.6B parameter fine-tuned model on prompt fidelity, with faster sampling and no training. For steering text diffusion models with rewards for text quality and specific text attributes, we find that FK steering generates lower perplexity, more linguistically acceptable outputs and enables gradient-free control of attributes like toxicity. Our results demonstrate that inference-time scaling and steering of diffusion models - even with off-the-shelf rewards - can provide significant sample quality gains and controllability benefits. Code is available at https://github.com/zacharyhorvitz/Fk-Diffusion-Steering .
Related papers
- TADA: Improved Diffusion Sampling with Training-free Augmented Dynamics [42.99251753481681]
We introduce a new sampling method that is up to $186%$ faster than the current state of the art solver for comparative FID on ImageNet512.<n>The key to our method resides in using higher-dimensional initial noise, allowing to produce more detailed samples.
arXiv Detail & Related papers (2025-06-26T20:30:27Z) - Progressive Inference-Time Annealing of Diffusion Models for Sampling from Boltzmann Densities [85.83359661628575]
We propose Progressive Inference-Time Annealing (PITA) to learn diffusion-based samplers.<n>PITA combines two complementary techniques: Annealing of the Boltzmann distribution and Diffusion smoothing.<n>It enables equilibrium sampling of N-body particle systems, Alanine Dipeptide, and tripeptides in Cartesian coordinates.
arXiv Detail & Related papers (2025-06-19T17:14:22Z) - Fake it till You Make it: Reward Modeling as Discriminative Prediction [49.31309674007382]
GAN-RM is an efficient reward modeling framework that eliminates manual preference annotation and explicit quality dimension engineering.<n>Our method trains the reward model through discrimination between a small set of representative, unpaired target samples.<n>Experiments demonstrate our GAN-RM's effectiveness across multiple key applications.
arXiv Detail & Related papers (2025-06-16T17:59:40Z) - Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference.
Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable.
We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z) - Training-free Diffusion Model Alignment with Sampling Demons [15.400553977713914]
We propose an optimization approach, dubbed Demon, to guide the denoising process at inference time without backpropagation through reward functions or model retraining.
Our approach works by controlling noise distribution in denoising steps to concentrate density on regions corresponding to high rewards through optimization.
Our experiments show that the proposed approach significantly improves the average aesthetics scores text-to-image generation.
arXiv Detail & Related papers (2024-10-08T07:33:49Z) - Feedback Efficient Online Fine-Tuning of Diffusion Models [52.170384048274364]
We propose a novel reinforcement learning procedure that efficiently explores on the manifold of feasible samples.
We present a theoretical analysis providing a regret guarantee, as well as empirical validation across three domains.
arXiv Detail & Related papers (2024-02-26T07:24:32Z) - Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized
Control [54.132297393662654]
Diffusion models excel at capturing complex data distributions, such as those of natural images and proteins.
While diffusion models are trained to represent the distribution in the training dataset, we often are more concerned with other properties, such as the aesthetic quality of the generated images.
We present theoretical and empirical evidence that demonstrates our framework is capable of efficiently generating diverse samples with high genuine rewards.
arXiv Detail & Related papers (2024-02-23T08:54:42Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - Extracting Reward Functions from Diffusion Models [7.834479563217133]
Decision-making diffusion models can be trained on lower-quality data, and then be steered with a reward function to generate near-optimal trajectories.
We consider the problem of extracting a reward function by comparing a decision-making diffusion model that models low-reward behavior and one that models high-reward behavior.
We show that our approach generalizes beyond sequential decision-making by learning a reward-like function from two large-scale image generation diffusion models.
arXiv Detail & Related papers (2023-06-01T17:59:12Z) - Diff-Instruct: A Universal Approach for Transferring Knowledge From
Pre-trained Diffusion Models [77.83923746319498]
We propose a framework called Diff-Instruct to instruct the training of arbitrary generative models.
We show that Diff-Instruct results in state-of-the-art single-step diffusion-based models.
Experiments on refining GAN models show that the Diff-Instruct can consistently improve the pre-trained generators of GAN models.
arXiv Detail & Related papers (2023-05-29T04:22:57Z) - Towards Controllable Diffusion Models via Reward-Guided Exploration [15.857464051475294]
We propose a novel framework that guides the training-phase of diffusion models via reinforcement learning (RL)
RL enables calculating policy gradients via samples from a pay-off distribution proportional to exponential scaled rewards, rather than from policies themselves.
Experiments on 3D shape and molecule generation tasks show significant improvements over existing conditional diffusion models.
arXiv Detail & Related papers (2023-04-14T13:51:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.