Related papers: PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement

PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement

URL: http://arxiv.org/abs/2403.04014v1
Date: Wed, 6 Mar 2024 19:55:01 GMT
Title: PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement
Authors: Zhijie Wang, Yuheng Huang, Da Song, Lei Ma, Tianyi Zhang
Abstract summary: We propose PromptCharm, a system that facilitates text-to-image creation through multi-modal prompt engineering and refinement. PromptCharm first automatically refines and optimize the user's initial prompt. It supports the user in exploring and selecting different image styles within a large database. It renders model explanations by visualizing the model's attention values.
Score: 12.55886762028225
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The recent advancements in Generative AI have significantly advanced the field of text-to-image generation. The state-of-the-art text-to-image model, Stable Diffusion, is now capable of synthesizing high-quality images with a strong sense of aesthetics. Crafting text prompts that align with the model's interpretation and the user's intent thus becomes crucial. However, prompting remains challenging for novice users due to the complexity of the stable diffusion model and the non-trivial efforts required for iteratively editing and refining the text prompts. To address these challenges, we propose PromptCharm, a mixed-initiative system that facilitates text-to-image creation through multi-modal prompt engineering and refinement. To assist novice users in prompting, PromptCharm first automatically refines and optimizes the user's initial prompt. Furthermore, PromptCharm supports the user in exploring and selecting different image styles within a large database. To assist users in effectively refining their prompts and images, PromptCharm renders model explanations by visualizing the model's attention values. If the user notices any unsatisfactory areas in the generated images, they can further refine the images through model attention adjustment or image inpainting within the rich feedback loop of PromptCharm. To evaluate the effectiveness and usability of PromptCharm, we conducted a controlled user study with 12 participants and an exploratory user study with another 12 participants. These two studies show that participants using PromptCharm were able to create images with higher quality and better aligned with the user's expectations compared with using two variants of PromptCharm that lacked interaction or visualization support.

Related papers

VisualPrompter: Prompt Optimization with Visual Feedback for Text-to-Image Synthesis [15.392482488365955]
VisualPrompter is a training-free prompt engineering framework that refines user inputs to model-preferred sentences.<n>Our framework achieves new state-of-the-art performance on multiple benchmarks for text-image alignment evaluation.
arXiv Detail & Related papers (2025-06-29T08:24:39Z)
RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning [88.14234949860105]
RePrompt is a novel reprompting framework that introduces explicit reasoning into the prompt enhancement process via reinforcement learning.<n>Our approach enables end-to-end training without human-annotated data.
arXiv Detail & Related papers (2025-05-23T06:44:26Z)
Enhancing Intent Understanding for Ambiguous prompt: A Human-Machine Co-Adaption Strategy [50.714983524814606]
Current image generation systems produce high-quality images but struggle with ambiguous user prompts.<n>We propose a human-machine co-adaption strategy using mutual information between the user's prompts and the pictures under modification.
arXiv Detail & Related papers (2025-01-25T10:32:00Z)
What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance [23.411806572667707]
Text-to-image synthesis (TIS) models heavily rely on the quality and specificity of textual prompts. Existing solutions relieve this via automatic model-preferred prompt generation from user queries. We propose DialPrompt, a multi-turn dialogue-based TIS prompt generation model that emphasises user-centricity.
arXiv Detail & Related papers (2024-08-23T08:35:35Z)
Prompt Refinement with Image Pivot for Text-to-Image Generation [103.63292948223592]
We introduce Prompt Refinement with Image Pivot (PRIP) for text-to-image generation. PRIP decomposes refinement process into two data-rich tasks: inferring representations of user-preferred images from user languages and translating image representations into system languages. It substantially outperforms a wide range of baselines and effectively transfers to unseen systems in a zero-shot manner.
arXiv Detail & Related papers (2024-06-28T22:19:24Z)
Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations [109.65267337037842]
We introduce the task of Image Editing Recommendation (IER) IER aims to automatically generate diverse creative editing instructions from an input image and a simple prompt representing the users' under-specified editing purpose. We introduce Creativity-Vision Language Assistant(Creativity-VLA), a multimodal framework designed specifically for edit-instruction generation.
arXiv Detail & Related papers (2024-05-31T18:22:29Z)
Dynamic Prompt Optimizing for Text-to-Image Generation [63.775458908172176]
We introduce the textbfPrompt textbfAuto-textbfEditing (PAE) method to improve text-to-image generative models. We employ an online reinforcement learning strategy to explore the weights and injection time steps of each word, leading to the dynamic fine-control prompts.
arXiv Detail & Related papers (2024-04-05T13:44:39Z)
Prompt Expansion for Adaptive Text-to-Image Generation [51.67811570987088]
This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort. The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts. We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods.
arXiv Detail & Related papers (2023-12-27T21:12:21Z)
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation [4.21512101973222]
NeuroPrompts is an adaptive framework that enhances a user's prompt to improve the quality of generations produced by text-to-image models. Our framework utilizes constrained text decoding with a pre-trained language model that has been adapted to generate prompts similar to those produced by human prompt engineers.
arXiv Detail & Related papers (2023-11-20T22:57:47Z)
PromptMagician: Interactive Prompt Engineering for Text-to-Image Creation [16.41459454076984]
This research proposes PromptMagician, a visual analysis system that helps users explore the image results and refine the input prompts. The backbone of our system is a prompt recommendation model that takes user prompts as input, retrieves similar prompt-image pairs from DiffusionDB, and identifies special (important and relevant) prompt keywords.
arXiv Detail & Related papers (2023-07-18T07:46:25Z)
Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models [29.057923932305123]
We present Promptify, an interactive system that supports prompt exploration and refinement for text-to-image generative models. Our user study shows that Promptify effectively facilitates the text-to-image workflow and outperforms an existing baseline tool widely used for text-to-image generation.
arXiv Detail & Related papers (2023-04-18T22:59:11Z)
Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery [55.905769757007185]
We describe an approach to robustly optimize hard text prompts through efficient gradient-based optimization. Our approach automatically generates hard text-based prompts for both text-to-image and text-to-text applications. In the text-to-text setting, we show that hard prompts can be automatically discovered that are effective in tuning LMs for classification.
arXiv Detail & Related papers (2023-02-07T18:40:18Z)
Optimizing Prompts for Text-to-Image Generation [97.61295501273288]
Well-designed prompts can guide text-to-image models to generate amazing images. But the performant prompts are often model-specific and misaligned with user input. We propose prompt adaptation, a framework that automatically adapts original user input to model-preferred prompts.
arXiv Detail & Related papers (2022-12-19T16:50:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.