Related papers: POET: Supporting Prompting Creativity and Personalization with Automated Expansion of Text-to-Image Generation

POET: Supporting Prompting Creativity and Personalization with Automated Expansion of Text-to-Image Generation

URL: http://arxiv.org/abs/2504.13392v1
Date: Fri, 18 Apr 2025 00:54:36 GMT
Title: POET: Supporting Prompting Creativity and Personalization with Automated Expansion of Text-to-Image Generation
Authors: Evans Xu Han, Alice Qian Zhang, Hong Shen, Haiyi Zhu, Paul Pu Liang, Jane Hsieh,
Abstract summary: State-of-the-art visual generative AI tools hold immense potential to assist users in the early ideation stages of creative tasks.<n>Many large-scale text-to-image systems are designed for broad applicability, yielding conventional output that may limit creative exploration.<n>We introduce POET, a real-time interactive tool that automatically discovers dimensions of homogeneity in text-to-image generative models.
Score: 31.886910258606875
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: State-of-the-art visual generative AI tools hold immense potential to assist users in the early ideation stages of creative tasks -- offering the ability to generate (rather than search for) novel and unprecedented (instead of existing) images of considerable quality that also adhere to boundless combinations of user specifications. However, many large-scale text-to-image systems are designed for broad applicability, yielding conventional output that may limit creative exploration. They also employ interaction methods that may be difficult for beginners. Given that creative end users often operate in diverse, context-specific ways that are often unpredictable, more variation and personalization are necessary. We introduce POET, a real-time interactive tool that (1) automatically discovers dimensions of homogeneity in text-to-image generative models, (2) expands these dimensions to diversify the output space of generated images, and (3) learns from user feedback to personalize expansions. An evaluation with 28 users spanning four creative task domains demonstrated POET's ability to generate results with higher perceived diversity and help users reach satisfaction in fewer prompts during creative tasks, thereby prompting them to deliberate and reflect more on a wider range of possible produced results during the co-creative process. Focusing on visual creativity, POET offers a first glimpse of how interaction techniques of future text-to-image generation tools may support and align with more pluralistic values and the needs of end users during the ideation stages of their work.

Related papers

Bringing Characters to New Stories: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting [71.29100512700064]
We present T-Prompter, a training-free method for theme-specific image generation. T-Prompter integrates reference images into generative models, allowing users to seamlessly specify the target theme. Our approach enables consistent story generation, character design, realistic character generation, and style-guided image generation.
arXiv Detail & Related papers (2025-01-26T19:01:19Z)
X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models [77.98981338798383]
In-context generation is a key component of large language models' (LLMs) open-task generalization capability.<n>X-Prompt is a purely auto-regressive large-vision language model designed to deliver competitive performance across a wide range of both seen and unseen image generation tasks.<n>A unified training task for both text and image prediction enables X-Prompt to handle general image generation with enhanced task awareness from in-context examples.
arXiv Detail & Related papers (2024-12-02T18:59:26Z)
Reflective Human-Machine Co-adaptation for Enhanced Text-to-Image Generation Dialogue System [7.009995656535664]
We propose a reflective human-machine co-adaptation strategy, named RHM-CAS. externally, the Agent engages in meaningful language interactions with users to reflect on and refine the generated images. Internally, the Agent tries to optimize the policy based on user preferences, ensuring that the final outcomes closely align with user preferences.
arXiv Detail & Related papers (2024-08-27T18:08:00Z)
Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations [109.65267337037842]
We introduce the task of Image Editing Recommendation (IER) IER aims to automatically generate diverse creative editing instructions from an input image and a simple prompt representing the users' under-specified editing purpose. We introduce Creativity-Vision Language Assistant(Creativity-VLA), a multimodal framework designed specifically for edit-instruction generation.
arXiv Detail & Related papers (2024-05-31T18:22:29Z)
Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond [87.1712108247199]
Our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP) We develop a generic and personalization generative framework, that can handle a wide range of personalized needs. Our methodology enhances the capabilities of foundational language models for personalized tasks.
arXiv Detail & Related papers (2024-03-15T20:21:31Z)
A New Creative Generation Pipeline for Click-Through Rate with Stable Diffusion Model [8.945197427679924]
Traditional AI-based approaches face the same problem of not considering user information while having limited aesthetic knowledge from designers. To optimize the results, the generated creatives in traditional methods are then ranked by another module named creative ranking model. This paper proposes a new automated Creative Generation pipeline for Click-Through Rate (CG4CTR) with the goal of improving CTR during the creative generation stage.
arXiv Detail & Related papers (2024-01-17T03:27:39Z)
Prompt Expansion for Adaptive Text-to-Image Generation [51.67811570987088]
This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort. The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts. We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods.
arXiv Detail & Related papers (2023-12-27T21:12:21Z)
The role of interface design on prompt-mediated creativity in Generative AI [0.0]
We analyze more than 145,000 prompts from two Generative AI platforms. We find that users exhibit a tendency towards exploration of new topics over exploitation of concepts visited previously.
arXiv Detail & Related papers (2023-11-30T22:33:34Z)
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models [71.15152184631951]
We propose a fully automated solution for consistent character generation with the sole input being a text prompt. Our method strikes a better balance between prompt alignment and identity consistency compared to the baseline methods.
arXiv Detail & Related papers (2023-11-16T18:59:51Z)
PromptMagician: Interactive Prompt Engineering for Text-to-Image Creation [16.41459454076984]
This research proposes PromptMagician, a visual analysis system that helps users explore the image results and refine the input prompts. The backbone of our system is a prompt recommendation model that takes user prompts as input, retrieves similar prompt-image pairs from DiffusionDB, and identifies special (important and relevant) prompt keywords.
arXiv Detail & Related papers (2023-07-18T07:46:25Z)
Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models [29.057923932305123]
We present Promptify, an interactive system that supports prompt exploration and refinement for text-to-image generative models. Our user study shows that Promptify effectively facilitates the text-to-image workflow and outperforms an existing baseline tool widely used for text-to-image generation.
arXiv Detail & Related papers (2023-04-18T22:59:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.