DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion
Models
- URL: http://arxiv.org/abs/2305.16381v3
- Date: Wed, 1 Nov 2023 04:48:26 GMT
- Title: DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion
Models
- Authors: Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig
Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, Kimin Lee
- Abstract summary: We propose using online reinforcement learning to fine-tune text-to-image models.
We focus on diffusion models, defining the fine-tuning task as an RL problem.
Our approach, coined DPOK, integrates policy optimization with KL regularization.
- Score: 97.31200133440308
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning from human feedback has been shown to improve text-to-image models.
These techniques first learn a reward function that captures what humans care
about in the task and then improve the models based on the learned reward
function. Even though relatively simple approaches (e.g., rejection sampling
based on reward scores) have been investigated, fine-tuning text-to-image
models with the reward function remains challenging. In this work, we propose
using online reinforcement learning (RL) to fine-tune text-to-image models. We
focus on diffusion models, defining the fine-tuning task as an RL problem, and
updating the pre-trained text-to-image diffusion models using policy gradient
to maximize the feedback-trained reward. Our approach, coined DPOK, integrates
policy optimization with KL regularization. We conduct an analysis of KL
regularization for both RL fine-tuning and supervised fine-tuning. In our
experiments, we show that DPOK is generally superior to supervised fine-tuning
with respect to both image-text alignment and image quality. Our code is
available at
https://github.com/google-research/google-research/tree/master/dpok.
Related papers
- RL for Consistency Models: Faster Reward Guided Text-to-Image Generation [15.238373471473645]
We propose a framework for fine-tuning consistency models viaReinforcement Learning (RL)
Our framework, called Reinforcement Learning for Consistency Model (RLCM), frames the iterative inference process of a consistency model as an RL procedure.
Comparing to RL finetuned diffusion models, RLCM trains significantly faster, improves the quality of the generation measured under the reward objectives, and speeds up the inference procedure by generating high quality images with as few as two inference steps.
arXiv Detail & Related papers (2024-03-25T15:40:22Z) - Direct Consistency Optimization for Compositional Text-to-Image
Personalization [73.94505688626651]
Text-to-image (T2I) diffusion models, when fine-tuned on a few personal images, are able to generate visuals with a high degree of consistency.
We propose to fine-tune the T2I model by maximizing consistency to reference images, while penalizing the deviation from the pretrained model.
arXiv Detail & Related papers (2024-02-19T09:52:41Z) - Aligning Modalities in Vision Large Language Models via Preference
Fine-tuning [67.62925151837675]
In this work, we frame the hallucination problem as an alignment issue, tackle it with preference tuning.
Specifically, we propose POVID to generate feedback data with AI models.
We use ground-truth instructions as the preferred response and a two-stage approach to generate dispreferred data.
In experiments across broad benchmarks, we show that we can not only reduce hallucinations, but improve model performance across standard benchmarks, outperforming prior approaches.
arXiv Detail & Related papers (2024-02-18T00:56:16Z) - Reinforcement Learning from Diffusion Feedback: Q* for Image Search [2.5835347022640254]
We present two models for image generation using model-agnostic learning.
RLDF is a singular approach for visual imitation through prior-preserving reward function guidance.
It generates high-quality images over varied domains showcasing class-consistency and strong visual diversity.
arXiv Detail & Related papers (2023-11-27T09:20:12Z) - Aligning Text-to-Image Diffusion Models with Reward Backpropagation [62.45086888512723]
We propose AlignProp, a method that aligns diffusion models to downstream reward functions using end-to-end backpropagation of the reward gradient.
We show AlignProp achieves higher rewards in fewer training steps than alternatives, while being conceptually simpler.
arXiv Detail & Related papers (2023-10-05T17:59:18Z) - ImageReward: Learning and Evaluating Human Preferences for Text-to-Image
Generation [30.977582244445742]
We build ImageReward, the first general-purpose text-to-image human preference reward model.
Its training is based on our systematic annotation pipeline including rating and ranking.
In human evaluation, ImageReward outperforms existing scoring models and metrics.
arXiv Detail & Related papers (2023-04-12T16:58:13Z) - Aligning Text-to-Image Models using Human Feedback [104.76638092169604]
Current text-to-image models often generate images that are inadequately aligned with text prompts.
We propose a fine-tuning method for aligning such models using human feedback.
Our results demonstrate the potential for learning from human feedback to significantly improve text-to-image models.
arXiv Detail & Related papers (2023-02-23T17:34:53Z) - NP-DRAW: A Non-Parametric Structured Latent Variable Modelfor Image
Generation [139.8037697822064]
We present a non-parametric structured latent variable model for image generation, called NP-DRAW.
It sequentially draws on a latent canvas in a part-by-part fashion and then decodes the image from the canvas.
arXiv Detail & Related papers (2021-06-25T05:17:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.