Related papers: Learn to Guide Your Diffusion Model

Learn to Guide Your Diffusion Model

URL: http://arxiv.org/abs/2510.00815v1
Date: Wed, 01 Oct 2025 12:21:48 GMT
Title: Learn to Guide Your Diffusion Model
Authors: Alexandre Galashov, Ashwini Pokle, Arnaud Doucet, Arthur Gretton, Mauricio Delbracio, Valentin De Bortoli,
Abstract summary: We study a technique for improving quality of samples from conditional diffusion models.<n>We learn guidance weights $omega_c,(s,t)$, which are functions of the conditioning $c$, the time $t$ from which we denoise, and the time $s$ towards which we denoise.<n>We extend our framework to reward guided sampling, enabling the model to target distributions tilted by a reward function.
Score: 84.82855046749657
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Classifier-free guidance (CFG) is a widely used technique for improving the perceptual quality of samples from conditional diffusion models. It operates by linearly combining conditional and unconditional score estimates using a guidance weight $\omega$. While a large, static weight can markedly improve visual results, this often comes at the cost of poorer distributional alignment. In order to better approximate the target conditional distribution, we instead learn guidance weights $\omega_{c,(s,t)}$, which are continuous functions of the conditioning $c$, the time $t$ from which we denoise, and the time $s$ towards which we denoise. We achieve this by minimizing the distributional mismatch between noised samples from the true conditional distribution and samples from the guided diffusion process. We extend our framework to reward guided sampling, enabling the model to target distributions tilted by a reward function $R(x_0,c)$, defined on clean data and a conditioning $c$. We demonstrate the effectiveness of our methodology on low-dimensional toy examples and high-dimensional image settings, where we observe improvements in Fr\'echet inception distance (FID) for image generation. In text-to-image applications, we observe that employing a reward function given by the CLIP score leads to guidance weights that improve image-prompt alignment.

Related papers

FAIL: Flow Matching Adversarial Imitation Learning for Image Generation [52.643484089126844]
Post-training of flow matching models-aligning the output distribution with a high-quality target-is mathematically equivalent to Imitation learning.<n>We propose Flow Matching Adrial Learning (FAIL), which minimizes policy-expert divergence through adversarial training without explicit rewards or pairwise comparisons.
arXiv Detail & Related papers (2026-02-12T16:36:33Z)
Test-time scaling of diffusions with flow maps [68.79792714591564]
A common recipe to improve diffusion models at test-time is to introduce the gradient of the reward into the dynamics of the diffusion itself.<n>We propose a simple solution by working directly with a flow map.<n>By exploiting a relationship between the flow map and velocity field governing the instantaneous transport, we construct an algorithm, Flow Map Trajectory Tilting (FMTT), which provably performs better ascent on the reward than standard test-time methods.
arXiv Detail & Related papers (2025-11-27T18:44:12Z)
Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling [70.8832906871441]
We study how to steer generation toward desired rewards without retraining the models.<n>Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement.<n>We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity.
arXiv Detail & Related papers (2025-07-11T08:00:47Z)
Diffusion Tree Sampling: Scalable inference-time alignment of diffusion models [13.312007032203857]
Adapting a pretrained diffusion model to new objectives at inference time remains an open problem in generative modeling.<n>We introduce a tree-based approach that samples from the reward-aligned target density by propagating terminal rewards back through the diffusion chain.<n>By reusing information from previous generations, we get an anytime algorithm that turns additional compute into steadily better samples.
arXiv Detail & Related papers (2025-06-25T17:59:10Z)
Conditional Diffusion Models with Classifier-Free Gibbs-like Guidance [19.83064246586143]
CFG is a technique for improving conditional diffusion models by linearly combining the outputs of conditional and unconditional denoisers.<n>While CFG enhances visual quality and improves alignment with prompts, it often reduces sample diversity.<n>We propose a Gibbs-like sampling procedure to draw samples from the desired tilted distribution.
arXiv Detail & Related papers (2025-05-27T12:27:33Z)
Robust Representation Consistency Model via Contrastive Denoising [83.47584074390842]
randomized smoothing provides theoretical guarantees for certifying robustness against adversarial perturbations.<n> diffusion models have been successfully employed for randomized smoothing to purify noise-perturbed samples.<n>We reformulate the generative modeling task along the diffusion trajectories in pixel space as a discriminative task in the latent space.
arXiv Detail & Related papers (2025-01-22T18:52:06Z)
Gradient-Free Classifier Guidance for Diffusion Model Sampling [4.450496470631169]
Gradient-free Guidance (GFCG) method consistently improves class prediction accuracy. For ImageNet 512$times$512, we achieve a record $FD_textDINOv2$ 23.09, while simultaneously attaining a higher classification Precision (94.3%) compared to ATG (90.2%)
arXiv Detail & Related papers (2024-11-23T00:22:21Z)
Consistency Model is an Effective Posterior Sample Approximation for Diffusion Inverse Solvers [28.678613691787096]
Previous approximations rely on the posterior means, which may not lie in the support of the image distribution. We introduce a novel approach for posterior approximation that guarantees to generate valid samples within the support of the image distribution.
arXiv Detail & Related papers (2024-02-09T02:23:47Z)
Bridging the Gap: Addressing Discrepancies in Diffusion Model Training for Classifier-Free Guidance [1.6804613362826175]
Diffusion models have emerged as a pivotal advancement in generative models. In this paper we aim to underscore a discrepancy between conventional training methods and the desired conditional sampling behavior. We introduce an updated loss function that better aligns training objectives with sampling behaviors.
arXiv Detail & Related papers (2023-11-02T02:03:12Z)
End-to-End Diffusion Latent Optimization Improves Classifier Guidance [81.27364542975235]
Direct Optimization of Diffusion Latents (DOODL) is a novel guidance method. It enables plug-and-play guidance by optimizing diffusion latents. It outperforms one-step classifier guidance on computational and human evaluation metrics.
arXiv Detail & Related papers (2023-03-23T22:43:52Z)
Generalized Differentiable RANSAC [95.95627475224231]
$nabla$-RANSAC is a differentiable RANSAC that allows learning the entire randomized robust estimation pipeline. $nabla$-RANSAC is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives.
arXiv Detail & Related papers (2022-12-26T15:13:13Z)
Permuted AdaIN: Reducing the Bias Towards Global Statistics in Image Classification [97.81205777897043]
Recent work has shown that convolutional neural network classifiers overly rely on texture at the expense of shape cues. We make a similar but different distinction between shape and local image cues, on the one hand, and global image statistics, on the other. Our method, called Permuted Adaptive Instance Normalization (pAdaIN), reduces the representation of global statistics in the hidden layers of image classifiers.
arXiv Detail & Related papers (2020-10-09T16:38:38Z)
Deep Residual Flow for Out of Distribution Detection [27.218308616245164]
We present a novel approach that improves upon the state-of-the-art by leveraging an expressive density model based on normalizing flows. We demonstrate the effectiveness of our method in ResNet and DenseNet architectures trained on various image datasets.
arXiv Detail & Related papers (2020-01-15T16:38:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.