Learning To Sample From Diffusion Models Via Inverse Reinforcement Learning
- URL: http://arxiv.org/abs/2602.08689v1
- Date: Mon, 09 Feb 2026 14:10:44 GMT
- Title: Learning To Sample From Diffusion Models Via Inverse Reinforcement Learning
- Authors: Constant Bourdrez, Alexandre Vérine, Olivier Cappé,
- Abstract summary: Diffusion models generate samples through an iterative denoising process, guided by a neural network.<n>We introduce an inverse reinforcement learning framework for learning sampling strategies without retraining the denoiser.<n>We provide experimental evidence that this approach can improve the quality of samples generated by pretrained diffusion models.
- Score: 43.678382510171986
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models generate samples through an iterative denoising process, guided by a neural network. While training the denoiser on real-world data is computationally demanding, the sampling procedure itself is more flexible. This adaptability serves as a key lever in practice, enabling improvements in both the quality of generated samples and the efficiency of the sampling process. In this work, we introduce an inverse reinforcement learning framework for learning sampling strategies without retraining the denoiser. We formulate the diffusion sampling procedure as a discrete-time finite-horizon Markov Decision Process, where actions correspond to optional modifications of the sampling dynamics. To optimize action scheduling, we avoid defining an explicit reward function. Instead, we directly match the target behavior expected from the sampler using policy gradient techniques. We provide experimental evidence that this approach can improve the quality of samples generated by pretrained diffusion models and automatically tune sampling hyperparameters.
Related papers
- TFTF: Training-Free Targeted Flow for Conditional Sampling [1.4151684142137693]
We propose a training-free conditional sampling method for flow matching models based on importance sampling.<n>Because a nave application of importance sampling suffers from weighteneracy in high-dimensional settings, we modify and incorporate a resampling technique in sequential Monte Carlo.<n>Our framework requires no additional training, while providing theoretical guarantees of accuracy.
arXiv Detail & Related papers (2026-02-13T13:41:35Z) - Guided Star-Shaped Masked Diffusion [11.965970427956684]
We introduce a novel sampling algorithm that works with pre-trained models.<n>Our method reformulates the generation process using a star-shaped paradigm.<n>We augment it with a learnable re-masking scheduler that intelligently identifies and revises likely errors.
arXiv Detail & Related papers (2025-10-09T15:53:51Z) - Noise Conditional Variational Score Distillation [60.38982038894823]
Noise Conditional Variational Score Distillation (NCVSD) is a novel method for distilling pretrained diffusion models into generative denoisers.<n>By integrating this insight into the Variational Score Distillation framework, we enable scalable learning of generative denoisers.
arXiv Detail & Related papers (2025-06-11T06:01:39Z) - Arbitrary-steps Image Super-resolution via Diffusion Inversion [68.78628844966019]
This study presents a new image super-resolution (SR) technique based on diffusion inversion, aiming at harnessing the rich image priors encapsulated in large pre-trained diffusion models to improve SR performance.<n>We design a Partial noise Prediction strategy to construct an intermediate state of the diffusion model, which serves as the starting sampling point.<n>Once trained, this noise predictor can be used to initialize the sampling process partially along the diffusion trajectory, generating the desirable high-resolution result.
arXiv Detail & Related papers (2024-12-12T07:24:13Z) - Adaptive teachers for amortized samplers [76.88721198565861]
We propose an adaptive training distribution (the teacher) to guide the training of the primary amortized sampler (the student)<n>We validate the effectiveness of this approach in a synthetic environment designed to present an exploration challenge.
arXiv Detail & Related papers (2024-10-02T11:33:13Z) - Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training [20.492630610281658]
Diffusion models learn to denoise data and the trained denoiser is then used to generate new samples from the data distribution.
We introduce a new self-supervised training objective that differentiates the levels of noise added to a sample.
We show by diverse experiments that the proposed contrastive diffusion training is effective for both sequential and parallel settings.
arXiv Detail & Related papers (2024-07-12T03:03:50Z) - Adaptive Client Sampling in Federated Learning via Online Learning with Bandit Feedback [31.826205004616227]
Client sampling plays an important role in federated learning (FL) systems as it affects the convergence rate of optimization algorithms.<n>We propose an online mirror descent (OSMD) algorithm designed to minimize the sampling variance.<n>We show how our sampling method can improve the convergence speed of federated optimization algorithms over the widely used uniform sampling.
arXiv Detail & Related papers (2021-12-28T23:50:52Z) - Optimal Importance Sampling for Federated Learning [57.14673504239551]
Federated learning involves a mixture of centralized and decentralized processing tasks.
The sampling of both agents and data is generally uniform; however, in this work we consider non-uniform sampling.
We derive optimal importance sampling strategies for both agent and data selection and show that non-uniform sampling without replacement improves the performance of the original FedAvg algorithm.
arXiv Detail & Related papers (2020-10-26T14:15:33Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.