Leveraging Human Revisions for Improving Text-to-Layout Models
- URL: http://arxiv.org/abs/2405.13026v1
- Date: Thu, 16 May 2024 01:33:09 GMT
- Title: Leveraging Human Revisions for Improving Text-to-Layout Models
- Authors: Amber Xie, Chin-Yi Cheng, Forrest Huang, Yang Li,
- Abstract summary: We propose using nuanced feedback through the form of human revisions for stronger alignment.
Our method, Revision-Aware Reward Models, allows a generative text-to- text model to produce more modern, designer-aligned layouts.
- Score: 16.617352120973806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning from human feedback has shown success in aligning large, pretrained models with human values. Prior works have mostly focused on learning from high-level labels, such as preferences between pairs of model outputs. On the other hand, many domains could benefit from more involved, detailed feedback, such as revisions, explanations, and reasoning of human users. Our work proposes using nuanced feedback through the form of human revisions for stronger alignment. In this paper, we ask expert designers to fix layouts generated from a generative layout model that is pretrained on a large-scale dataset of mobile screens. Then, we train a reward model based on how human designers revise these generated layouts. With the learned reward model, we optimize our model with reinforcement learning from human feedback (RLHF). Our method, Revision-Aware Reward Models ($\method$), allows a generative text-to-layout model to produce more modern, designer-aligned layouts, showing the potential for utilizing human revisions and stronger forms of feedback in improving generative models.
Related papers
- Revision Matters: Generative Design Guided by Revision Edits [18.976709992275286]
We investigate how revision edits from human designers can benefit a multimodal generative model.
Our results show that human revision plays a critical role in iterative layout refinement.
Our work paves the way for iterative design revision based on pre-trained large multimodal models.
arXiv Detail & Related papers (2024-05-27T17:54:51Z) - Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation [87.50120181861362]
VisionPrefer is a high-quality and fine-grained preference dataset that captures multiple preference aspects.
We train a reward model VP-Score over VisionPrefer to guide the training of text-to-image generative models and the preference prediction accuracy of VP-Score is comparable to human annotators.
arXiv Detail & Related papers (2024-04-23T14:53:15Z) - Enhancing Image Caption Generation Using Reinforcement Learning with
Human Feedback [0.0]
We explore a potential method to amplify the performance of the Deep Neural Network Model to generate captions that are preferred by humans.
This was achieved by integrating Supervised Learning and Reinforcement Learning with Human Feedback.
We provide a sketch of our approach and results, hoping to contribute to the ongoing advances in the field of human-aligned generative AI models.
arXiv Detail & Related papers (2024-03-11T13:57:05Z) - Personalized Language Modeling from Personalized Human Feedback [49.344833339240566]
Reinforcement Learning from Human Feedback (RLHF) is commonly used to fine-tune large language models to better align with human preferences.
In this work, we aim to address this problem by developing methods for building personalized language models.
arXiv Detail & Related papers (2024-02-06T04:18:58Z) - UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset.
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z) - HIVE: Harnessing Human Feedback for Instructional Visual Editing [127.29436858998064]
We present a novel framework to harness human feedback for instructional visual editing (HIVE)
Specifically, we collect human feedback on the edited images and learn a reward function to capture the underlying user preferences.
We then introduce scalable diffusion model fine-tuning methods that can incorporate human preferences based on the estimated reward.
arXiv Detail & Related papers (2023-03-16T19:47:41Z) - Aligning Text-to-Image Models using Human Feedback [104.76638092169604]
Current text-to-image models often generate images that are inadequately aligned with text prompts.
We propose a fine-tuning method for aligning such models using human feedback.
Our results demonstrate the potential for learning from human feedback to significantly improve text-to-image models.
arXiv Detail & Related papers (2023-02-23T17:34:53Z) - Chain of Hindsight Aligns Language Models with Feedback [62.68665658130472]
We propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity.
We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model.
By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors.
arXiv Detail & Related papers (2023-02-06T10:28:16Z) - Learning to summarize from human feedback [18.964548137315333]
We show that it is possible to significantly improve summary quality by training a model to optimize for human preferences.
We apply our method to a version of the TL;DR dataset of Reddit posts and find that our models significantly outperform both human reference summaries and much larger models fine-tuned with supervised learning alone.
Our models also transfer to CNN/DM news articles, producing summaries nearly as good as the human reference without any news-specific fine-tuning.
arXiv Detail & Related papers (2020-09-02T19:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.