PromptCharm: Text-to-Image Generation through Multi-modal Prompting and
Refinement
- URL: http://arxiv.org/abs/2403.04014v1
- Date: Wed, 6 Mar 2024 19:55:01 GMT
- Title: PromptCharm: Text-to-Image Generation through Multi-modal Prompting and
Refinement
- Authors: Zhijie Wang, Yuheng Huang, Da Song, Lei Ma, Tianyi Zhang
- Abstract summary: We propose PromptCharm, a system that facilitates text-to-image creation through multi-modal prompt engineering and refinement.
PromptCharm first automatically refines and optimize the user's initial prompt.
It supports the user in exploring and selecting different image styles within a large database.
It renders model explanations by visualizing the model's attention values.
- Score: 12.55886762028225
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent advancements in Generative AI have significantly advanced the
field of text-to-image generation. The state-of-the-art text-to-image model,
Stable Diffusion, is now capable of synthesizing high-quality images with a
strong sense of aesthetics. Crafting text prompts that align with the model's
interpretation and the user's intent thus becomes crucial. However, prompting
remains challenging for novice users due to the complexity of the stable
diffusion model and the non-trivial efforts required for iteratively editing
and refining the text prompts. To address these challenges, we propose
PromptCharm, a mixed-initiative system that facilitates text-to-image creation
through multi-modal prompt engineering and refinement. To assist novice users
in prompting, PromptCharm first automatically refines and optimizes the user's
initial prompt. Furthermore, PromptCharm supports the user in exploring and
selecting different image styles within a large database. To assist users in
effectively refining their prompts and images, PromptCharm renders model
explanations by visualizing the model's attention values. If the user notices
any unsatisfactory areas in the generated images, they can further refine the
images through model attention adjustment or image inpainting within the rich
feedback loop of PromptCharm. To evaluate the effectiveness and usability of
PromptCharm, we conducted a controlled user study with 12 participants and an
exploratory user study with another 12 participants. These two studies show
that participants using PromptCharm were able to create images with higher
quality and better aligned with the user's expectations compared with using two
variants of PromptCharm that lacked interaction or visualization support.
Related papers
- What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance [23.411806572667707]
Text-to-image synthesis (TIS) models heavily rely on the quality and specificity of textual prompts.
Existing solutions relieve this via automatic model-preferred prompt generation from user queries.
We propose DialPrompt, a multi-turn dialogue-based TIS prompt generation model that emphasises user-centricity.
arXiv Detail & Related papers (2024-08-23T08:35:35Z) - Prompt Refinement with Image Pivot for Text-to-Image Generation [103.63292948223592]
We introduce Prompt Refinement with Image Pivot (PRIP) for text-to-image generation.
PRIP decomposes refinement process into two data-rich tasks: inferring representations of user-preferred images from user languages and translating image representations into system languages.
It substantially outperforms a wide range of baselines and effectively transfers to unseen systems in a zero-shot manner.
arXiv Detail & Related papers (2024-06-28T22:19:24Z) - Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations [109.65267337037842]
We introduce the task of Image Editing Recommendation (IER)
IER aims to automatically generate diverse creative editing instructions from an input image and a simple prompt representing the users' under-specified editing purpose.
We introduce Creativity-Vision Language Assistant(Creativity-VLA), a multimodal framework designed specifically for edit-instruction generation.
arXiv Detail & Related papers (2024-05-31T18:22:29Z) - Dynamic Prompt Optimizing for Text-to-Image Generation [63.775458908172176]
We introduce the textbfPrompt textbfAuto-textbfEditing (PAE) method to improve text-to-image generative models.
We employ an online reinforcement learning strategy to explore the weights and injection time steps of each word, leading to the dynamic fine-control prompts.
arXiv Detail & Related papers (2024-04-05T13:44:39Z) - Prompt Expansion for Adaptive Text-to-Image Generation [51.67811570987088]
This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort.
The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts.
We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods.
arXiv Detail & Related papers (2023-12-27T21:12:21Z) - NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation [4.21512101973222]
NeuroPrompts is an adaptive framework that enhances a user's prompt to improve the quality of generations produced by text-to-image models.
Our framework utilizes constrained text decoding with a pre-trained language model that has been adapted to generate prompts similar to those produced by human prompt engineers.
arXiv Detail & Related papers (2023-11-20T22:57:47Z) - PromptMagician: Interactive Prompt Engineering for Text-to-Image
Creation [16.41459454076984]
This research proposes PromptMagician, a visual analysis system that helps users explore the image results and refine the input prompts.
The backbone of our system is a prompt recommendation model that takes user prompts as input, retrieves similar prompt-image pairs from DiffusionDB, and identifies special (important and relevant) prompt keywords.
arXiv Detail & Related papers (2023-07-18T07:46:25Z) - Promptify: Text-to-Image Generation through Interactive Prompt
Exploration with Large Language Models [29.057923932305123]
We present Promptify, an interactive system that supports prompt exploration and refinement for text-to-image generative models.
Our user study shows that Promptify effectively facilitates the text-to-image workflow and outperforms an existing baseline tool widely used for text-to-image generation.
arXiv Detail & Related papers (2023-04-18T22:59:11Z) - Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt
Tuning and Discovery [55.905769757007185]
We describe an approach to robustly optimize hard text prompts through efficient gradient-based optimization.
Our approach automatically generates hard text-based prompts for both text-to-image and text-to-text applications.
In the text-to-text setting, we show that hard prompts can be automatically discovered that are effective in tuning LMs for classification.
arXiv Detail & Related papers (2023-02-07T18:40:18Z) - Optimizing Prompts for Text-to-Image Generation [97.61295501273288]
Well-designed prompts can guide text-to-image models to generate amazing images.
But the performant prompts are often model-specific and misaligned with user input.
We propose prompt adaptation, a framework that automatically adapts original user input to model-preferred prompts.
arXiv Detail & Related papers (2022-12-19T16:50:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.