Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting
- URL: http://arxiv.org/abs/2310.08129v3
- Date: Sun, 7 Apr 2024 03:53:29 GMT
- Title: Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting
- Authors: Zijie Chen, Lichao Zhang, Fangsheng Weng, Lili Pan, Zhenzhong Lan,
- Abstract summary: We propose a novel approach that involves rewriting user prompts based on a newly collected large-scale text-to-image dataset with over 300k prompts from 3115 users.
Our rewriting model enhances the expressiveness and alignment of user prompts with their intended visual outputs.
- Score: 13.252755478909899
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite significant progress in the field, it is still challenging to create personalized visual representations that align closely with the desires and preferences of individual users. This process requires users to articulate their ideas in words that are both comprehensible to the models and accurately capture their vision, posing difficulties for many users. In this paper, we tackle this challenge by leveraging historical user interactions with the system to enhance user prompts. We propose a novel approach that involves rewriting user prompts based on a newly collected large-scale text-to-image dataset with over 300k prompts from 3115 users. Our rewriting model enhances the expressiveness and alignment of user prompts with their intended visual outputs. Experimental results demonstrate the superiority of our methods over baseline approaches, as evidenced in our new offline evaluation method and online tests. Our code and dataset are available at https://github.com/zzjchen/Tailored-Visions.
Related papers
- Reflective Human-Machine Co-adaptation for Enhanced Text-to-Image Generation Dialogue System [7.009995656535664]
We propose a reflective human-machine co-adaptation strategy, named RHM-CAS.
externally, the Agent engages in meaningful language interactions with users to reflect on and refine the generated images.
Internally, the Agent tries to optimize the policy based on user preferences, ensuring that the final outcomes closely align with user preferences.
arXiv Detail & Related papers (2024-08-27T18:08:00Z) - What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance [23.411806572667707]
Text-to-image synthesis (TIS) models heavily rely on the quality and specificity of textual prompts.
Existing solutions relieve this via automatic model-preferred prompt generation from user queries.
We propose DialPrompt, a multi-turn dialogue-based TIS prompt generation model that emphasises user-centricity.
arXiv Detail & Related papers (2024-08-23T08:35:35Z) - Prompt Refinement with Image Pivot for Text-to-Image Generation [103.63292948223592]
We introduce Prompt Refinement with Image Pivot (PRIP) for text-to-image generation.
PRIP decomposes refinement process into two data-rich tasks: inferring representations of user-preferred images from user languages and translating image representations into system languages.
It substantially outperforms a wide range of baselines and effectively transfers to unseen systems in a zero-shot manner.
arXiv Detail & Related papers (2024-06-28T22:19:24Z) - User Embedding Model for Personalized Language Prompting [9.472634942498859]
We introduce a new User Embedding Module (UEM) that efficiently processes user history in free-form text by compressing and representing them as embeddings.
Our experiments demonstrate the superior capability of this approach in handling significantly longer histories.
The main contribution of this research is to demonstrate the ability to bias language models with user signals represented as embeddings.
arXiv Detail & Related papers (2024-01-10T00:35:52Z) - Prompt Expansion for Adaptive Text-to-Image Generation [51.67811570987088]
This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort.
The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts.
We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods.
arXiv Detail & Related papers (2023-12-27T21:12:21Z) - Stellar: Systematic Evaluation of Human-Centric Personalized
Text-to-Image Methods [52.806258774051216]
We focus on text-to-image systems that input a single image of an individual and ground the generation process along with text describing the desired visual context.
We introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available.
We derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA.
arXiv Detail & Related papers (2023-12-11T04:47:39Z) - RELIC: Investigating Large Language Model Responses using Self-Consistency [58.63436505595177]
Large Language Models (LLMs) are notorious for blending fact with fiction and generating non-factual content, known as hallucinations.
We propose an interactive system that helps users gain insight into the reliability of the generated text.
arXiv Detail & Related papers (2023-11-28T14:55:52Z) - Human Learning by Model Feedback: The Dynamics of Iterative Prompting
with Midjourney [28.39697076030535]
This paper analyzes the dynamics of the user prompts along such iterations.
We show that prompts predictably converge toward specific traits along these iterations.
The possibility that users adapt to the model's preference raises concerns about reusing user data for further training.
arXiv Detail & Related papers (2023-11-20T19:28:52Z) - The Chosen One: Consistent Characters in Text-to-Image Diffusion Models [71.15152184631951]
We propose a fully automated solution for consistent character generation with the sole input being a text prompt.
Our method strikes a better balance between prompt alignment and identity consistency compared to the baseline methods.
arXiv Detail & Related papers (2023-11-16T18:59:51Z) - Promptify: Text-to-Image Generation through Interactive Prompt
Exploration with Large Language Models [29.057923932305123]
We present Promptify, an interactive system that supports prompt exploration and refinement for text-to-image generative models.
Our user study shows that Promptify effectively facilitates the text-to-image workflow and outperforms an existing baseline tool widely used for text-to-image generation.
arXiv Detail & Related papers (2023-04-18T22:59:11Z) - A Simple Long-Tailed Recognition Baseline via Vision-Language Model [92.2866546058082]
The visual world naturally exhibits a long-tailed distribution of open classes, which poses great challenges to modern visual systems.
Recent advances in contrastive visual-language pretraining shed light on a new pathway for visual recognition.
We propose BALLAD to leverage contrastive vision-language models for long-tailed recognition.
arXiv Detail & Related papers (2021-11-29T17:49:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.