ImageReward: Learning and Evaluating Human Preferences for Text-to-Image
Generation
- URL: http://arxiv.org/abs/2304.05977v4
- Date: Thu, 28 Dec 2023 14:13:35 GMT
- Title: ImageReward: Learning and Evaluating Human Preferences for Text-to-Image
Generation
- Authors: Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding,
Jie Tang, Yuxiao Dong
- Abstract summary: We build ImageReward, the first general-purpose text-to-image human preference reward model.
Its training is based on our systematic annotation pipeline including rating and ranking.
In human evaluation, ImageReward outperforms existing scoring models and metrics.
- Score: 30.977582244445742
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a comprehensive solution to learn and improve text-to-image models
from human preference feedback. To begin with, we build ImageReward -- the
first general-purpose text-to-image human preference reward model -- to
effectively encode human preferences. Its training is based on our systematic
annotation pipeline including rating and ranking, which collects 137k expert
comparisons to date. In human evaluation, ImageReward outperforms existing
scoring models and metrics, making it a promising automatic metric for
evaluating text-to-image synthesis. On top of it, we propose Reward Feedback
Learning (ReFL), a direct tuning algorithm to optimize diffusion models against
a scorer. Both automatic and human evaluation support ReFL's advantages over
compared methods. All code and datasets are provided at
\url{https://github.com/THUDM/ImageReward}.
Related papers
- Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models [85.96013373385057]
Fine-tuning text-to-image models with reward functions trained on human feedback data has proven effective for aligning model behavior with human intent.
However, excessive optimization with such reward models, which serve as mere proxy objectives, can compromise the performance of fine-tuned models.
We propose TextNorm, a method that enhances alignment based on a measure of reward model confidence estimated across a set of semantically contrastive text prompts.
arXiv Detail & Related papers (2024-04-02T11:40:38Z) - UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset.
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z) - Human Preference Score v2: A Solid Benchmark for Evaluating Human
Preferences of Text-to-Image Synthesis [38.70605308204128]
Recent text-to-image generative models can generate high-fidelity images from text inputs.
HPD v2 captures human preferences on images from a wide range of sources.
HPD v2 comprises 798,090 human preference choices on 433,760 pairs of images.
arXiv Detail & Related papers (2023-06-15T17:59:31Z) - DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion
Models [97.31200133440308]
We propose using online reinforcement learning to fine-tune text-to-image models.
We focus on diffusion models, defining the fine-tuning task as an RL problem.
Our approach, coined DPOK, integrates policy optimization with KL regularization.
arXiv Detail & Related papers (2023-05-25T17:35:38Z) - VILA: Learning Image Aesthetics from User Comments with Vision-Language
Pretraining [53.470662123170555]
We propose learning image aesthetics from user comments, and exploring vision-language pretraining methods to learn multimodal aesthetic representations.
Specifically, we pretrain an image-text encoder-decoder model with image-comment pairs, using contrastive and generative objectives to learn rich and generic aesthetic semantics without human labels.
Our results show that our pretrained aesthetic vision-language model outperforms prior works on image aesthetic captioning over the AVA-Captions dataset.
arXiv Detail & Related papers (2023-03-24T23:57:28Z) - HIVE: Harnessing Human Feedback for Instructional Visual Editing [127.29436858998064]
We present a novel framework to harness human feedback for instructional visual editing (HIVE)
Specifically, we collect human feedback on the edited images and learn a reward function to capture the underlying user preferences.
We then introduce scalable diffusion model fine-tuning methods that can incorporate human preferences based on the estimated reward.
arXiv Detail & Related papers (2023-03-16T19:47:41Z) - Aligning Text-to-Image Models using Human Feedback [104.76638092169604]
Current text-to-image models often generate images that are inadequately aligned with text prompts.
We propose a fine-tuning method for aligning such models using human feedback.
Our results demonstrate the potential for learning from human feedback to significantly improve text-to-image models.
arXiv Detail & Related papers (2023-02-23T17:34:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.