PromptMagician: Interactive Prompt Engineering for Text-to-Image
Creation
- URL: http://arxiv.org/abs/2307.09036v2
- Date: Tue, 15 Aug 2023 09:44:57 GMT
- Title: PromptMagician: Interactive Prompt Engineering for Text-to-Image
Creation
- Authors: Yingchaojie Feng, Xingbo Wang, Kam Kwai Wong, Sijia Wang, Yuhong Lu,
Minfeng Zhu, Baicheng Wang, Wei Chen
- Abstract summary: This research proposes PromptMagician, a visual analysis system that helps users explore the image results and refine the input prompts.
The backbone of our system is a prompt recommendation model that takes user prompts as input, retrieves similar prompt-image pairs from DiffusionDB, and identifies special (important and relevant) prompt keywords.
- Score: 16.41459454076984
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative text-to-image models have gained great popularity among the public
for their powerful capability to generate high-quality images based on natural
language prompts. However, developing effective prompts for desired images can
be challenging due to the complexity and ambiguity of natural language. This
research proposes PromptMagician, a visual analysis system that helps users
explore the image results and refine the input prompts. The backbone of our
system is a prompt recommendation model that takes user prompts as input,
retrieves similar prompt-image pairs from DiffusionDB, and identifies special
(important and relevant) prompt keywords. To facilitate interactive prompt
refinement, PromptMagician introduces a multi-level visualization for the
cross-modal embedding of the retrieved images and recommended keywords, and
supports users in specifying multiple criteria for personalized exploration.
Two usage scenarios, a user study, and expert interviews demonstrate the
effectiveness and usability of our system, suggesting it facilitates prompt
engineering and improves the creativity support of the generative text-to-image
model.
Related papers
- Prompt Refinement with Image Pivot for Text-to-Image Generation [103.63292948223592]
We introduce Prompt Refinement with Image Pivot (PRIP) for text-to-image generation.
PRIP decomposes refinement process into two data-rich tasks: inferring representations of user-preferred images from user languages and translating image representations into system languages.
It substantially outperforms a wide range of baselines and effectively transfers to unseen systems in a zero-shot manner.
arXiv Detail & Related papers (2024-06-28T22:19:24Z) - Unified Text-to-Image Generation and Retrieval [96.72318842152148]
We propose a unified framework in the context of Multimodal Large Language Models (MLLMs)
We first explore the intrinsic discrimi abilities of MLLMs and introduce a generative retrieval method to perform retrieval in a training-free manner.
We then unify generation and retrieval in an autoregressive generation way and propose an autonomous decision module to choose the best-matched one between generated and retrieved images.
arXiv Detail & Related papers (2024-06-09T15:00:28Z) - PromptCharm: Text-to-Image Generation through Multi-modal Prompting and
Refinement [12.55886762028225]
We propose PromptCharm, a system that facilitates text-to-image creation through multi-modal prompt engineering and refinement.
PromptCharm first automatically refines and optimize the user's initial prompt.
It supports the user in exploring and selecting different image styles within a large database.
It renders model explanations by visualizing the model's attention values.
arXiv Detail & Related papers (2024-03-06T19:55:01Z) - Prompt Expansion for Adaptive Text-to-Image Generation [51.67811570987088]
This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort.
The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts.
We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods.
arXiv Detail & Related papers (2023-12-27T21:12:21Z) - The Contemporary Art of Image Search: Iterative User Intent Expansion
via Vision-Language Model [4.531548217880843]
We introduce an innovative user intent expansion framework for image search.
Our framework leverages visual-language models to parse and compose multi-modal user inputs.
The proposed framework significantly improves users' image search experience.
arXiv Detail & Related papers (2023-12-04T06:14:25Z) - NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation [4.21512101973222]
NeuroPrompts is an adaptive framework that enhances a user's prompt to improve the quality of generations produced by text-to-image models.
Our framework utilizes constrained text decoding with a pre-trained language model that has been adapted to generate prompts similar to those produced by human prompt engineers.
arXiv Detail & Related papers (2023-11-20T22:57:47Z) - MultiPrompter: Cooperative Prompt Optimization with Multi-Agent
Reinforcement Learning [68.40755873520808]
MultiPrompter is a new framework that views prompt optimization as a cooperative game between prompters.
We show that MultiPrompter effectively reduces the problem size and helps prompters learn optimal prompts.
arXiv Detail & Related papers (2023-10-25T15:58:51Z) - Sentence-level Prompts Benefit Composed Image Retrieval [69.78119883060006]
Composed image retrieval (CIR) is the task of retrieving specific images by using a query that involves both a reference image and a relative caption.
We propose to leverage pretrained V-L models, e.g., BLIP-2, to generate sentence-level prompts.
Our proposed method performs favorably against the state-of-the-art CIR methods on the Fashion-IQ and CIRR datasets.
arXiv Detail & Related papers (2023-10-09T07:31:44Z) - Promptify: Text-to-Image Generation through Interactive Prompt
Exploration with Large Language Models [29.057923932305123]
We present Promptify, an interactive system that supports prompt exploration and refinement for text-to-image generative models.
Our user study shows that Promptify effectively facilitates the text-to-image workflow and outperforms an existing baseline tool widely used for text-to-image generation.
arXiv Detail & Related papers (2023-04-18T22:59:11Z) - Optimizing Prompts for Text-to-Image Generation [97.61295501273288]
Well-designed prompts can guide text-to-image models to generate amazing images.
But the performant prompts are often model-specific and misaligned with user input.
We propose prompt adaptation, a framework that automatically adapts original user input to model-preferred prompts.
arXiv Detail & Related papers (2022-12-19T16:50:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.