Related papers: DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models

URL: http://arxiv.org/abs/2305.16381v3
Date: Wed, 1 Nov 2023 04:48:26 GMT
Title: DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models
Authors: Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, Kimin Lee
Abstract summary: We propose using online reinforcement learning to fine-tune text-to-image models. We focus on diffusion models, defining the fine-tuning task as an RL problem. Our approach, coined DPOK, integrates policy optimization with KL regularization.
Score: 97.31200133440308
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning from human feedback has been shown to improve text-to-image models. These techniques first learn a reward function that captures what humans care about in the task and then improve the models based on the learned reward function. Even though relatively simple approaches (e.g., rejection sampling based on reward scores) have been investigated, fine-tuning text-to-image models with the reward function remains challenging. In this work, we propose using online reinforcement learning (RL) to fine-tune text-to-image models. We focus on diffusion models, defining the fine-tuning task as an RL problem, and updating the pre-trained text-to-image diffusion models using policy gradient to maximize the feedback-trained reward. Our approach, coined DPOK, integrates policy optimization with KL regularization. We conduct an analysis of KL regularization for both RL fine-tuning and supervised fine-tuning. In our experiments, we show that DPOK is generally superior to supervised fine-tuning with respect to both image-text alignment and image quality. Our code is available at https://github.com/google-research/google-research/tree/master/dpok.

Related papers

Compile Scene Graphs with Reinforcement Learning [69.36723767339001]
Next token prediction is the fundamental principle for training large language models (LLMs) We introduce R1-SGG, a multimodal LLM (M-LLM) trained via supervised fine-tuning (SFT) on the scene graph dataset. We design a graph-centric reward function that integrates node-level rewards, edge-level rewards, and a format consistency reward.
arXiv Detail & Related papers (2025-04-18T10:46:22Z)
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards [52.90573877727541]
reinforcement learning (RL) has been considered for diffusion model fine-tuning. RL's effectiveness is limited by the challenge of sparse reward. $textB2text-DiffuRL$ is compatible with existing optimization algorithms.
arXiv Detail & Related papers (2025-03-14T09:45:19Z)
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks [51.439283251703635]
We create a good, generalist perception model that can tackle multiple tasks, within limits on computational resources and training data. Our exhaustive evaluation metrics demonstrate that DICEPTION effectively tackles multiple perception tasks, achieving performance on par with state-of-the-art models. We show that the strategy of assigning random colors to different instances is highly effective in both entity segmentation and semantic segmentation.
arXiv Detail & Related papers (2025-02-24T13:51:06Z)
David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training [8.352666876052616]
We propose Diff-Instruct* (DI*), a data-efficient post-training approach for one-step text-to-image generative models.<n>Our method frames alignment as online reinforcement learning from human feedback.<n>Our 2.6B emphDI*-SDXL-1step model outperforms the 50-step 12B FLUX-dev model.
arXiv Detail & Related papers (2024-10-28T10:26:19Z)
RL for Consistency Models: Faster Reward Guided Text-to-Image Generation [15.238373471473645]
We propose a framework for fine-tuning consistency models viaReinforcement Learning (RL) Our framework, called Reinforcement Learning for Consistency Model (RLCM), frames the iterative inference process of a consistency model as an RL procedure. Comparing to RL finetuned diffusion models, RLCM trains significantly faster, improves the quality of the generation measured under the reward objectives, and speeds up the inference procedure by generating high quality images with as few as two inference steps.
arXiv Detail & Related papers (2024-03-25T15:40:22Z)
Direct Consistency Optimization for Compositional Text-to-Image Personalization [73.94505688626651]
Text-to-image (T2I) diffusion models, when fine-tuned on a few personal images, are able to generate visuals with a high degree of consistency. We propose to fine-tune the T2I model by maximizing consistency to reference images, while penalizing the deviation from the pretrained model.
arXiv Detail & Related papers (2024-02-19T09:52:41Z)
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning [67.62925151837675]
In this work, we frame the hallucination problem as an alignment issue, tackle it with preference tuning. Specifically, we propose POVID to generate feedback data with AI models. We use ground-truth instructions as the preferred response and a two-stage approach to generate dispreferred data. In experiments across broad benchmarks, we show that we can not only reduce hallucinations, but improve model performance across standard benchmarks, outperforming prior approaches.
arXiv Detail & Related papers (2024-02-18T00:56:16Z)
Reinforcement Learning from Diffusion Feedback: Q* for Image Search [2.5835347022640254]
We present two models for image generation using model-agnostic learning. RLDF is a singular approach for visual imitation through prior-preserving reward function guidance. It generates high-quality images over varied domains showcasing class-consistency and strong visual diversity.
arXiv Detail & Related papers (2023-11-27T09:20:12Z)
Aligning Text-to-Image Diffusion Models with Reward Backpropagation [62.45086888512723]
We propose AlignProp, a method that aligns diffusion models to downstream reward functions using end-to-end backpropagation of the reward gradient. We show AlignProp achieves higher rewards in fewer training steps than alternatives, while being conceptually simpler.
arXiv Detail & Related papers (2023-10-05T17:59:18Z)
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation [30.977582244445742]
We build ImageReward, the first general-purpose text-to-image human preference reward model. Its training is based on our systematic annotation pipeline including rating and ranking. In human evaluation, ImageReward outperforms existing scoring models and metrics.
arXiv Detail & Related papers (2023-04-12T16:58:13Z)
Aligning Text-to-Image Models using Human Feedback [104.76638092169604]
Current text-to-image models often generate images that are inadequately aligned with text prompts. We propose a fine-tuning method for aligning such models using human feedback. Our results demonstrate the potential for learning from human feedback to significantly improve text-to-image models.
arXiv Detail & Related papers (2023-02-23T17:34:53Z)
NP-DRAW: A Non-Parametric Structured Latent Variable Modelfor Image Generation [139.8037697822064]
We present a non-parametric structured latent variable model for image generation, called NP-DRAW. It sequentially draws on a latent canvas in a part-by-part fashion and then decodes the image from the canvas.
arXiv Detail & Related papers (2021-06-25T05:17:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.