A User-Friendly Framework for Generating Model-Preferred Prompts in
Text-to-Image Synthesis
- URL: http://arxiv.org/abs/2402.12760v1
- Date: Tue, 20 Feb 2024 06:58:49 GMT
- Title: A User-Friendly Framework for Generating Model-Preferred Prompts in
Text-to-Image Synthesis
- Authors: Nailei Hei, Qianyu Guo, Zihao Wang, Yan Wang, Haofen Wang, Wenqiang
Zhang
- Abstract summary: Well-designed prompts have demonstrated the potential to guide text-to-image models in generating amazing images.
It is challenging for novice users to achieve the desired results by manually entering prompts.
We propose a novel framework that automatically translates user-input prompts into model-preferred prompts.
- Score: 33.71897211776133
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Well-designed prompts have demonstrated the potential to guide text-to-image
models in generating amazing images. Although existing prompt engineering
methods can provide high-level guidance, it is challenging for novice users to
achieve the desired results by manually entering prompts due to a discrepancy
between novice-user-input prompts and the model-preferred prompts. To bridge
the distribution gap between user input behavior and model training datasets,
we first construct a novel Coarse-Fine Granularity Prompts dataset (CFP) and
propose a novel User-Friendly Fine-Grained Text Generation framework (UF-FGTG)
for automated prompt optimization. For CFP, we construct a novel dataset for
text-to-image tasks that combines coarse and fine-grained prompts to facilitate
the development of automated prompt generation methods. For UF-FGTG, we propose
a novel framework that automatically translates user-input prompts into
model-preferred prompts. Specifically, we propose a prompt refiner that
continually rewrites prompts to empower users to select results that align with
their unique needs. Meanwhile, we integrate image-related loss functions from
the text-to-image model into the training process of text generation to
generate model-preferred prompts. Additionally, we propose an adaptive feature
extraction module to ensure diversity in the generated results. Experiments
demonstrate that our approach is capable of generating more visually appealing
and diverse images than previous state-of-the-art methods, achieving an average
improvement of 5% across six quality and aesthetic metrics.
Related papers
- TIPO: Text to Image with Text Presampling for Prompt Optimization [16.001151202788304]
TIPO is an innovative framework designed to enhance text-to-image (T2I) generation by language model (LM)
Unlike previous approaches that rely on Large Language Models (LLMs) or reinforcement learning (RL), TIPO adjusts user input prompts with the distribution of a trained prompt dataset.
arXiv Detail & Related papers (2024-11-12T19:09:45Z) - Batch-Instructed Gradient for Prompt Evolution:Systematic Prompt Optimization for Enhanced Text-to-Image Synthesis [3.783530340696776]
This study proposes a Multi-Agent framework to optimize input prompts for text-to-image generation models.
A professional prompts database serves as a benchmark to guide the instruction modifier towards generating high-caliber prompts.
Preliminary ablation studies highlight the effectiveness of various system components and suggest areas for future improvements.
arXiv Detail & Related papers (2024-06-13T00:33:29Z) - Dynamic Prompt Optimizing for Text-to-Image Generation [63.775458908172176]
We introduce the textbfPrompt textbfAuto-textbfEditing (PAE) method to improve text-to-image generative models.
We employ an online reinforcement learning strategy to explore the weights and injection time steps of each word, leading to the dynamic fine-control prompts.
arXiv Detail & Related papers (2024-04-05T13:44:39Z) - DiffChat: Learning to Chat with Text-to-Image Synthesis Models for
Interactive Image Creation [40.478839423995296]
We present DiffChat, a novel method to align Large Language Models (LLMs) to "chat" with prompt-as-input Text-to-Image Synthesis (TIS) models for interactive image creation.
Given a raw prompt/image and a user-specified instruction, DiffChat can effectively make appropriate modifications and generate the target prompt.
arXiv Detail & Related papers (2024-03-08T02:24:27Z) - Prompt Expansion for Adaptive Text-to-Image Generation [51.67811570987088]
This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort.
The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts.
We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods.
arXiv Detail & Related papers (2023-12-27T21:12:21Z) - NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation [4.21512101973222]
NeuroPrompts is an adaptive framework that enhances a user's prompt to improve the quality of generations produced by text-to-image models.
Our framework utilizes constrained text decoding with a pre-trained language model that has been adapted to generate prompts similar to those produced by human prompt engineers.
arXiv Detail & Related papers (2023-11-20T22:57:47Z) - BeautifulPrompt: Towards Automatic Prompt Engineering for Text-to-Image
Synthesis [14.852061933308276]
We propose BeautifulPrompt, a deep generative model to produce high-quality prompts from very simple raw descriptions.
In our work, we first fine-tuned the BeautifulPrompt model over low-quality and high-quality collecting prompt pairs.
We further showcase the integration of BeautifulPrompt to a cloud-native AI platform to provide better text-to-image generation service.
arXiv Detail & Related papers (2023-11-12T06:39:00Z) - SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
Large Language Models [56.88192537044364]
We propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models.
Our approach can make text-to-image diffusion models easier to use with better user experience.
arXiv Detail & Related papers (2023-05-09T05:48:38Z) - Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt
Tuning and Discovery [55.905769757007185]
We describe an approach to robustly optimize hard text prompts through efficient gradient-based optimization.
Our approach automatically generates hard text-based prompts for both text-to-image and text-to-text applications.
In the text-to-text setting, we show that hard prompts can be automatically discovered that are effective in tuning LMs for classification.
arXiv Detail & Related papers (2023-02-07T18:40:18Z) - Optimizing Prompts for Text-to-Image Generation [97.61295501273288]
Well-designed prompts can guide text-to-image models to generate amazing images.
But the performant prompts are often model-specific and misaligned with user input.
We propose prompt adaptation, a framework that automatically adapts original user input to model-preferred prompts.
arXiv Detail & Related papers (2022-12-19T16:50:41Z) - Cycle-Consistent Inverse GAN for Text-to-Image Synthesis [101.97397967958722]
We propose a novel unified framework of Cycle-consistent Inverse GAN for both text-to-image generation and text-guided image manipulation tasks.
We learn a GAN inversion model to convert the images back to the GAN latent space and obtain the inverted latent codes for each image.
In the text-guided optimization module, we generate images with the desired semantic attributes by optimizing the inverted latent codes.
arXiv Detail & Related papers (2021-08-03T08:38:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.