Human Preference Score: Better Aligning Text-to-Image Models with Human
Preference
- URL: http://arxiv.org/abs/2303.14420v2
- Date: Tue, 22 Aug 2023 12:26:07 GMT
- Title: Human Preference Score: Better Aligning Text-to-Image Models with Human
Preference
- Authors: Xiaoshi Wu, Keqiang Sun, Feng Zhu, Rui Zhao, Hongsheng Li
- Abstract summary: We collect a dataset of human choices on generated images from the Stable Foundation Discord channel.
Our experiments demonstrate that current evaluation metrics for generative models do not correlate well with human choices.
We propose a simple yet effective method to adapt Stable Diffusion to better align with human preferences.
- Score: 41.270068272447055
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent years have witnessed a rapid growth of deep generative models, with
text-to-image models gaining significant attention from the public. However,
existing models often generate images that do not align well with human
preferences, such as awkward combinations of limbs and facial expressions. To
address this issue, we collect a dataset of human choices on generated images
from the Stable Foundation Discord channel. Our experiments demonstrate that
current evaluation metrics for generative models do not correlate well with
human choices. Thus, we train a human preference classifier with the collected
dataset and derive a Human Preference Score (HPS) based on the classifier.
Using HPS, we propose a simple yet effective method to adapt Stable Diffusion
to better align with human preferences. Our experiments show that HPS
outperforms CLIP in predicting human choices and has good generalization
capability toward images generated from other models. By tuning Stable
Diffusion with the guidance of HPS, the adapted model is able to generate
images that are more preferred by human users. The project page is available
here: https://tgxs002.github.io/align_sd_web/ .
Related papers
- Personalized Preference Fine-tuning of Diffusion Models [75.22218338096316]
We introduce PPD, a multi-reward optimization objective that aligns diffusion models with personalized preferences.
With PPD, a diffusion model learns the individual preferences of a population of users in a few-shot way.
Our approach achieves an average win rate of 76% over Stable Cascade, generating images that more accurately reflect specific user preferences.
arXiv Detail & Related papers (2025-01-11T22:38:41Z) - Diff-Instruct*: Towards Human-Preferred One-step Text-to-image Generative Models [8.352666876052616]
We introduce Diff-Instruct* (DI*), an image data-free approach for building one-step text-to-image generative models.
We frame human preference alignment as online reinforcement learning using human feedback.
Unlike traditional RLHF approaches, which rely on the KL divergence for regularization, we introduce a novel score-based divergence regularization.
arXiv Detail & Related papers (2024-10-28T10:26:19Z) - Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback [87.37721254914476]
We introduce a routing framework that combines inputs from humans and LMs to achieve better annotation quality.
We train a performance prediction model to predict a reward model's performance on an arbitrary combination of human and LM annotations.
We show that the selected hybrid mixture achieves better reward model performance compared to using either one exclusively.
arXiv Detail & Related papers (2024-10-24T20:04:15Z) - Learning Multi-dimensional Human Preference for Text-to-Image Generation [18.10755131392223]
We propose the Multi-dimensional Preference Score (MPS), the first multi-dimensional preference scoring model for the evaluation of text-to-image models.
The MPS introduces the preference condition module upon CLIP model to learn these diverse preferences.
It is trained based on our Multi-dimensional Human Preference (MHP) dataset, which comprises 918,315 human preference choices across four dimensions.
arXiv Detail & Related papers (2024-05-23T15:39:43Z) - Enhancing Image Caption Generation Using Reinforcement Learning with
Human Feedback [0.0]
We explore a potential method to amplify the performance of the Deep Neural Network Model to generate captions that are preferred by humans.
This was achieved by integrating Supervised Learning and Reinforcement Learning with Human Feedback.
We provide a sketch of our approach and results, hoping to contribute to the ongoing advances in the field of human-aligned generative AI models.
arXiv Detail & Related papers (2024-03-11T13:57:05Z) - Diffusion Model Alignment Using Direct Preference Optimization [103.2238655827797]
Diffusion-DPO is a method to align diffusion models to human preferences by directly optimizing on human comparison data.
We fine-tune the base model of the state-of-the-art Stable Diffusion XL (SDXL)-1.0 model with Diffusion-DPO.
We also develop a variant that uses AI feedback and has comparable performance to training on human preferences.
arXiv Detail & Related papers (2023-11-21T15:24:05Z) - Exploring the Robustness of Human Parsers Towards Common Corruptions [99.89886010550836]
We construct three corruption robustness benchmarks, termed LIP-C, ATR-C, and Pascal-Person-Part-C, to assist us in evaluating the risk tolerance of human parsing models.
Inspired by the data augmentation strategy, we propose a novel heterogeneous augmentation-enhanced mechanism to bolster robustness under commonly corrupted conditions.
arXiv Detail & Related papers (2023-09-02T13:32:14Z) - Human Preference Score v2: A Solid Benchmark for Evaluating Human
Preferences of Text-to-Image Synthesis [38.70605308204128]
Recent text-to-image generative models can generate high-fidelity images from text inputs.
HPD v2 captures human preferences on images from a wide range of sources.
HPD v2 comprises 798,090 human preference choices on 433,760 pairs of images.
arXiv Detail & Related papers (2023-06-15T17:59:31Z) - Traditional Classification Neural Networks are Good Generators: They are
Competitive with DDPMs and GANs [104.72108627191041]
We show that conventional neural network classifiers can generate high-quality images comparable to state-of-the-art generative models.
We propose a mask-based reconstruction module to make semantic gradients-aware to synthesize plausible images.
We show that our method is also applicable to text-to-image generation by regarding image-text foundation models.
arXiv Detail & Related papers (2022-11-27T11:25:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.