BeautifulPrompt: Towards Automatic Prompt Engineering for Text-to-Image
Synthesis
- URL: http://arxiv.org/abs/2311.06752v1
- Date: Sun, 12 Nov 2023 06:39:00 GMT
- Title: BeautifulPrompt: Towards Automatic Prompt Engineering for Text-to-Image
Synthesis
- Authors: Tingfeng Cao, Chengyu Wang, Bingyan Liu, Ziheng Wu, Jinhui Zhu, Jun
Huang
- Abstract summary: We propose BeautifulPrompt, a deep generative model to produce high-quality prompts from very simple raw descriptions.
In our work, we first fine-tuned the BeautifulPrompt model over low-quality and high-quality collecting prompt pairs.
We further showcase the integration of BeautifulPrompt to a cloud-native AI platform to provide better text-to-image generation service.
- Score: 14.852061933308276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, diffusion-based deep generative models (e.g., Stable Diffusion)
have shown impressive results in text-to-image synthesis. However, current
text-to-image models often require multiple passes of prompt engineering by
humans in order to produce satisfactory results for real-world applications. We
propose BeautifulPrompt, a deep generative model to produce high-quality
prompts from very simple raw descriptions, which enables diffusion-based models
to generate more beautiful images. In our work, we first fine-tuned the
BeautifulPrompt model over low-quality and high-quality collecting prompt
pairs. Then, to ensure that our generated prompts can generate more beautiful
images, we further propose a Reinforcement Learning with Visual AI Feedback
technique to fine-tune our model to maximize the reward values of the generated
prompts, where the reward values are calculated based on the PickScore and the
Aesthetic Scores. Our results demonstrate that learning from visual AI feedback
promises the potential to improve the quality of generated prompts and images
significantly. We further showcase the integration of BeautifulPrompt to a
cloud-native AI platform to provide better text-to-image generation service in
the cloud.
Related papers
- TIPO: Text to Image with Text Presampling for Prompt Optimization [16.001151202788304]
TIPO is an innovative framework designed to enhance text-to-image (T2I) generation by language model (LM)
Unlike previous approaches that rely on Large Language Models (LLMs) or reinforcement learning (RL), TIPO adjusts user input prompts with the distribution of a trained prompt dataset.
arXiv Detail & Related papers (2024-11-12T19:09:45Z) - Batch-Instructed Gradient for Prompt Evolution:Systematic Prompt Optimization for Enhanced Text-to-Image Synthesis [3.783530340696776]
This study proposes a Multi-Agent framework to optimize input prompts for text-to-image generation models.
A professional prompts database serves as a benchmark to guide the instruction modifier towards generating high-caliber prompts.
Preliminary ablation studies highlight the effectiveness of various system components and suggest areas for future improvements.
arXiv Detail & Related papers (2024-06-13T00:33:29Z) - Dynamic Prompt Optimizing for Text-to-Image Generation [63.775458908172176]
We introduce the textbfPrompt textbfAuto-textbfEditing (PAE) method to improve text-to-image generative models.
We employ an online reinforcement learning strategy to explore the weights and injection time steps of each word, leading to the dynamic fine-control prompts.
arXiv Detail & Related papers (2024-04-05T13:44:39Z) - A User-Friendly Framework for Generating Model-Preferred Prompts in
Text-to-Image Synthesis [33.71897211776133]
Well-designed prompts have demonstrated the potential to guide text-to-image models in generating amazing images.
It is challenging for novice users to achieve the desired results by manually entering prompts.
We propose a novel framework that automatically translates user-input prompts into model-preferred prompts.
arXiv Detail & Related papers (2024-02-20T06:58:49Z) - Prompt Expansion for Adaptive Text-to-Image Generation [51.67811570987088]
This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort.
The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts.
We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods.
arXiv Detail & Related papers (2023-12-27T21:12:21Z) - NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation [4.21512101973222]
NeuroPrompts is an adaptive framework that enhances a user's prompt to improve the quality of generations produced by text-to-image models.
Our framework utilizes constrained text decoding with a pre-trained language model that has been adapted to generate prompts similar to those produced by human prompt engineers.
arXiv Detail & Related papers (2023-11-20T22:57:47Z) - Emu: Enhancing Image Generation Models Using Photogenic Needles in a
Haystack [75.00066365801993]
Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text.
These pre-trained models often face challenges when it comes to generating highly aesthetic images.
We propose quality-tuning to guide a pre-trained model to exclusively generate highly visually appealing images.
arXiv Detail & Related papers (2023-09-27T17:30:19Z) - ITI-GEN: Inclusive Text-to-Image Generation [56.72212367905351]
This study investigates inclusive text-to-image generative models that generate images based on human-written prompts.
We show that, for some attributes, images can represent concepts more expressively than text.
We propose a novel approach, ITI-GEN, that leverages readily available reference images for Inclusive Text-to-Image GENeration.
arXiv Detail & Related papers (2023-09-11T15:54:30Z) - Image Captions are Natural Prompts for Text-to-Image Models [70.30915140413383]
We analyze the relationship between the training effect of synthetic data and the synthetic data distribution induced by prompts.
We propose a simple yet effective method that prompts text-to-image generative models to synthesize more informative and diverse training data.
Our method significantly improves the performance of models trained on synthetic training data.
arXiv Detail & Related papers (2023-07-17T14:38:11Z) - SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
Large Language Models [56.88192537044364]
We propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models.
Our approach can make text-to-image diffusion models easier to use with better user experience.
arXiv Detail & Related papers (2023-05-09T05:48:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.