VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL
- URL: http://arxiv.org/abs/2505.15791v2
- Date: Mon, 02 Jun 2025 17:53:20 GMT
- Title: VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL
- Authors: Fengyuan Dai, Zifeng Zhuang, Yufei Huang, Siteng Huang, Bangyan Liao, Donglin Wang, Fajie Yuan,
- Abstract summary: VAlue-based Reinforced Diffusion (VARD) is a novel approach that first learns a value function predicting expection of rewards from intermediate states.<n>Our method maintains proximity to the pretrained model while enabling effective and stable training via backpropagation.
- Score: 28.95582264086289
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have emerged as powerful generative tools across various domains, yet tailoring pre-trained models to exhibit specific desirable properties remains challenging. While reinforcement learning (RL) offers a promising solution,current methods struggle to simultaneously achieve stable, efficient fine-tuning and support non-differentiable rewards. Furthermore, their reliance on sparse rewards provides inadequate supervision during intermediate steps, often resulting in suboptimal generation quality. To address these limitations, dense and differentiable signals are required throughout the diffusion process. Hence, we propose VAlue-based Reinforced Diffusion (VARD): a novel approach that first learns a value function predicting expection of rewards from intermediate states, and subsequently uses this value function with KL regularization to provide dense supervision throughout the generation process. Our method maintains proximity to the pretrained model while enabling effective and stable training via backpropagation. Experimental results demonstrate that our approach facilitates better trajectory guidance, improves training efficiency and extends the applicability of RL to diffusion models optimized for complex, non-differentiable reward functions.
Related papers
- Divergence Minimization Preference Optimization for Diffusion Model Alignment [58.651951388346525]
Divergence Minimization Preference Optimization (DMPO) is a principled method for aligning diffusion models by minimizing reverse KL divergence.<n>Our results show that diffusion models fine-tuned with DMPO can consistently outperform or match existing techniques.<n>DMPO unlocks a robust and elegant pathway for preference alignment, bridging principled theory with practical performance in diffusion models.
arXiv Detail & Related papers (2025-07-10T07:57:30Z) - Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design [53.93023688824764]
We address the problem of fine-tuning diffusion models for reward-guided generation in biomolecular design.<n>We propose an iterative distillation-based fine-tuning framework that enables diffusion models to optimize for arbitrary reward functions.<n>Our off-policy formulation, combined with KL divergence minimization, enhances training stability and sample efficiency compared to existing RL-based methods.
arXiv Detail & Related papers (2025-07-01T05:55:28Z) - PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity [9.092404060771306]
Diffusion models have shown impressive results in generating high-quality conditional samples.<n>However, existing methods often require additional training or neural function evaluations (NFEs)<n>We propose a novel and efficient method, termed PLADIS, which boosts pre-trained models by leveraging sparse attention.
arXiv Detail & Related papers (2025-03-10T07:23:19Z) - ROCM: RLHF on consistency models [8.905375742101707]
We propose a reward optimization framework for applying RLHF to consistency models.<n>We investigate various $f$-divergences as regularization strategies, striking a balance between reward and model consistency.
arXiv Detail & Related papers (2025-03-08T11:19:48Z) - Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning [9.025671446527694]
Reinforcement learning from human feedback (RLHF) has become a crucial step in building reliable generative AI models.<n>This study is to develop a disciplined approach to fine-tune diffusion models using continuous-time RL.
arXiv Detail & Related papers (2025-02-03T20:50:05Z) - Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets [65.42834731617226]
We propose a reinforcement learning method for diffusion model finetuning, dubbed Nabla-GFlowNet.<n>We show that our proposed method achieves fast yet diversity- and prior-preserving finetuning of Stable Diffusion, a large-scale text-conditioned image diffusion model.
arXiv Detail & Related papers (2024-12-10T18:59:58Z) - Prompt Tuning with Diffusion for Few-Shot Pre-trained Policy Generalization [55.14484317645865]
We develop a conditional diffusion model to produce exceptional quality prompts for offline reinforcement learning tasks.
We show that the Prompt diffuser is a robust and effective tool for the prompt-tuning process, demonstrating strong performance in the meta-RL tasks.
arXiv Detail & Related papers (2024-11-02T07:38:02Z) - Avoiding mode collapse in diffusion models fine-tuned with reinforcement learning [0.0]
Fine-tuning foundation models via reinforcement learning (RL) has proven promising for aligning to downstream objectives.
We exploit the hierarchical nature of diffusion models (DMs) and train them dynamically at each epoch with a tailored RL method.
We show that models trained with HRF achieve better preservation of diversity in downstream tasks, thus enhancing the fine-tuning robustness and at uncompromising mean rewards.
arXiv Detail & Related papers (2024-10-10T19:06:23Z) - Training-free Diffusion Model Alignment with Sampling Demons [15.400553977713914]
We propose an optimization approach, dubbed Demon, to guide the denoising process at inference time without backpropagation through reward functions or model retraining.<n>Our approach works by controlling noise distribution in denoising steps to concentrate density on regions corresponding to high rewards through optimization.<n>Our experiments show that the proposed approach significantly improves the average aesthetics scores text-to-image generation.
arXiv Detail & Related papers (2024-10-08T07:33:49Z) - Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review [63.31328039424469]
This tutorial provides a comprehensive survey of methods for fine-tuning diffusion models to optimize downstream reward functions.
We explain the application of various RL algorithms, including PPO, differentiable optimization, reward-weighted MLE, value-weighted sampling, and path consistency learning.
arXiv Detail & Related papers (2024-07-18T17:35:32Z) - Learning Off-policy with Model-based Intrinsic Motivation For Active Online Exploration [15.463313629574111]
This paper investigates how to achieve sample-efficient exploration in continuous control tasks.
We introduce an RL algorithm that incorporates a predictive model and off-policy learning elements.
We derive an intrinsic reward without incurring parameters overhead.
arXiv Detail & Related papers (2024-03-31T11:39:11Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - Structural Pruning for Diffusion Models [65.02607075556742]
We present Diff-Pruning, an efficient compression method tailored for learning lightweight diffusion models from pre-existing ones.
Our empirical assessment, undertaken across several datasets highlights two primary benefits of our proposed method.
arXiv Detail & Related papers (2023-05-18T12:38:21Z) - Training Generative Adversarial Networks by Solving Ordinary
Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training.
From this perspective, we hypothesise that instabilities in training GANs arise from the integration error.
We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.