Related papers: Training-Free Adaptation of Diffusion Models via Doob's $h$-Transform

Training-Free Adaptation of Diffusion Models via Doob's $h$-Transform

URL: http://arxiv.org/abs/2602.16198v1
Date: Wed, 18 Feb 2026 05:44:19 GMT
Title: Training-Free Adaptation of Diffusion Models via Doob's $h$-Transform
Authors: Qijie Zhu, Zeqi Ye, Han Liu, Zhaoran Wang, Minshuo Chen,
Abstract summary: DOIT (Doob-Oriented Inference-time Transformation) is a training-free and computationally efficient adaptation method.<n>We leverage Doob's $h$-transform to realize this transport, which induces a dynamic correction to the diffusion sampling process.<n>Our method consistently outperforms state-of-the-art baselines while preserving sampling efficiency.
Score: 37.05492050174751
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adaptation methods have been a workhorse for unlocking the transformative power of pre-trained diffusion models in diverse applications. Existing approaches often abstract adaptation objectives as a reward function and steer diffusion models to generate high-reward samples. However, these approaches can incur high computational overhead due to additional training, or rely on stringent assumptions on the reward such as differentiability. Moreover, despite their empirical success, theoretical justification and guarantees are seldom established. In this paper, we propose DOIT (Doob-Oriented Inference-time Transformation), a training-free and computationally efficient adaptation method that applies to generic, non-differentiable rewards. The key framework underlying our method is a measure transport formulation that seeks to transport the pre-trained generative distribution to a high-reward target distribution. We leverage Doob's $h$-transform to realize this transport, which induces a dynamic correction to the diffusion sampling process and enables efficient simulation-based computation without modifying the pre-trained model. Theoretically, we establish a high probability convergence guarantee to the target high-reward distribution via characterizing the approximation error in the dynamic Doob's correction. Empirically, on D4RL offline RL benchmarks, our method consistently outperforms state-of-the-art baselines while preserving sampling efficiency.

Related papers

Training-Free Distribution Adaptation for Diffusion Models via Maximum Mean Discrepancy Guidance [17.353524034156205]
MMD Guidance augments the reverse diffusion process with gradients of the Maximum Mean Discrepancy (MMD) between generated samples and a reference dataset.<n>Our framework naturally extends to prompt-aware adaptation in conditional generation models via product kernels.<n>Experiments on synthetic and real-world benchmarks demonstrate that MMD Guidance can achieve distributional alignment while preserving sample fidelity.
arXiv Detail & Related papers (2026-01-13T09:42:57Z)
Nonparametric Data Attribution for Diffusion Models [57.820618036556084]
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs.<n>We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images.
arXiv Detail & Related papers (2025-10-16T03:37:16Z)
Score-based Idempotent Distillation of Diffusion Models [0.9367224590861915]
Idempotent generative networks (IGNs) are a new line of generative models based on idempotent mapping to a target manifold.<n>In this work, we unite diffusion and IGNs by distilling idempotent models from diffusion model scores, called SIGN.<n>Our proposed method is highly stable and does not require adversarial losses. We provide a theoretical analysis of our proposed score-based training methods and empirically show that IGNs can be effectively distilled from a pre-trained diffusion model.
arXiv Detail & Related papers (2025-09-25T19:36:10Z)
Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance [63.33213516925946]
We introduce textbfAlign-Then-stEer (textttATE), a novel, data-efficient, and plug-and-play adaptation framework.<n>Our work presents a general and lightweight solution that greatly enhances the practicality of deploying VLA models to new robotic platforms and tasks.
arXiv Detail & Related papers (2025-09-02T07:51:59Z)
VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL [28.95582264086289]
VAlue-based Reinforced Diffusion (VARD) is a novel approach that first learns a value function predicting expection of rewards from intermediate states.<n>Our method maintains proximity to the pretrained model while enabling effective and stable training via backpropagation.
arXiv Detail & Related papers (2025-05-21T17:44:37Z)
Diffusion Classifier-Driven Reward for Offline Preference-based Reinforcement Learning [45.95668702930697]
We propose a novel preference-based reward acquisition method: Diffusion Preference-based Reward (DPR)<n>DPR directly treats step-wise preference-based reward acquisition as a binary classification and utilizes the robustness of diffusion classifiers to infer step-wise rewards discriminatively.<n>We also propose Diffusion Preference-based Reward (C-DPR), which conditions on trajectory-wise preference labels to enhance reward inference.
arXiv Detail & Related papers (2025-03-03T03:49:38Z)
Adaptive teachers for amortized samplers [76.88721198565861]
We propose an adaptive training distribution (the teacher) to guide the training of the primary amortized sampler (the student)<n>We validate the effectiveness of this approach in a synthetic environment designed to present an exploration challenge.
arXiv Detail & Related papers (2024-10-02T11:33:13Z)
Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review [63.31328039424469]
This tutorial provides a comprehensive survey of methods for fine-tuning diffusion models to optimize downstream reward functions. We explain the application of various RL algorithms, including PPO, differentiable optimization, reward-weighted MLE, value-weighted sampling, and path consistency learning.
arXiv Detail & Related papers (2024-07-18T17:35:32Z)
Variational Schrödinger Diffusion Models [14.480273869571468]
Schr"odinger bridge (SB) has emerged as the go-to method for optimizing transportation plans in diffusion models.<n>We leverage variational inference to linearize the forward score functions (variational scores) of SB.<n>We propose the variational Schr"odinger diffusion model (VSDM), where the forward process is a multivariate diffusion and the variational scores are adaptively optimized for efficient transport.
arXiv Detail & Related papers (2024-05-08T04:01:40Z)
Unsupervised Discovery of Interpretable Directions in h-space of Pre-trained Diffusion Models [63.1637853118899]
We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models. We employ a shift control module that works on h-space of pre-trained diffusion models to manipulate a sample into a shifted version of itself. By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions.
arXiv Detail & Related papers (2023-10-15T18:44:30Z)
Towards Controllable Diffusion Models via Reward-Guided Exploration [15.857464051475294]
We propose a novel framework that guides the training-phase of diffusion models via reinforcement learning (RL) RL enables calculating policy gradients via samples from a pay-off distribution proportional to exponential scaled rewards, rather than from policies themselves. Experiments on 3D shape and molecule generation tasks show significant improvements over existing conditional diffusion models.
arXiv Detail & Related papers (2023-04-14T13:51:26Z)
Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios. We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.