PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like
Interactions
- URL: http://arxiv.org/abs/2308.05184v1
- Date: Wed, 9 Aug 2023 18:41:11 GMT
- Title: PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like
Interactions
- Authors: John Joon Young Chung, Eytan Adar
- Abstract summary: PromptPaint allows users to mix prompts that express challenging concepts.
We characterize different approaches for mixing prompts, design trade-offs, and socio-technical challenges for generative models.
- Score: 12.792576041526287
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While diffusion-based text-to-image (T2I) models provide a simple and
powerful way to generate images, guiding this generation remains a challenge.
For concepts that are difficult to describe through language, users may
struggle to create prompts. Moreover, many of these models are built as
end-to-end systems, lacking support for iterative shaping of the image. In
response, we introduce PromptPaint, which combines T2I generation with
interactions that model how we use colored paints. PromptPaint allows users to
go beyond language to mix prompts that express challenging concepts. Just as we
iteratively tune colors through layered placements of paint on a physical
canvas, PromptPaint similarly allows users to apply different prompts to
different canvas areas and times of the generative process. Through a set of
studies, we characterize different approaches for mixing prompts, design
trade-offs, and socio-technical challenges for generative models. With
PromptPaint we provide insight into future steerable generative tools.
Related papers
- VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model [76.02314305164595]
This work presents a novel image outpainting framework that is capable of customizing the results according to the requirement of users.
We take advantage of a Multimodal Large Language Model (MLLM) that automatically extracts and organizes the corresponding textual descriptions of the masked and unmasked part of a given image.
In addition, a special Cross-Attention module, namely Center-Total-Surrounding (CTS), is elaborately designed to enhance further the the interaction between specific space regions of the image and corresponding parts of the text prompts.
arXiv Detail & Related papers (2024-06-03T07:14:19Z) - Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation [150.57983348059528]
PRISM is an algorithm that automatically identifies human-interpretable and transferable prompts.
It can effectively generate desired concepts given only black-box access to T2I models.
Our experiments demonstrate the versatility and effectiveness of PRISM in generating accurate prompts for objects, styles and images.
arXiv Detail & Related papers (2024-03-28T02:35:53Z) - Towards Language-Driven Video Inpainting via Multimodal Large Language Models [116.22805434658567]
We introduce a new task -- language-driven video inpainting.
It uses natural language instructions to guide the inpainting process.
We present the Remove Objects from Videos by Instructions dataset.
arXiv Detail & Related papers (2024-01-18T18:59:13Z) - HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models [59.01600111737628]
HD-Painter is a training free approach that accurately follows prompts and coherently scales to high resolution image inpainting.
To this end, we design the Prompt-Aware Introverted Attention (PAIntA) layer enhancing self-attention scores.
Our experiments demonstrate that HD-Painter surpasses existing state-of-the-art approaches quantitatively and qualitatively.
arXiv Detail & Related papers (2023-12-21T18:09:30Z) - A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting [38.53807472111521]
We introduce PowerPaint, the first high-quality and versatile inpainting model that excels in multiple inpainting tasks.
We demonstrate the versatility of the task prompt in PowerPaint by showcasing its effectiveness as a negative prompt for object removal.
We leverage prompt techniques to enable controllable shape-guided object inpainting, enhancing the model's applicability in shape-guided applications.
arXiv Detail & Related papers (2023-12-06T16:34:46Z) - Uni-paint: A Unified Framework for Multimodal Image Inpainting with
Pretrained Diffusion Model [19.800236358666123]
We propose Uni-paint, a unified framework for multimodal inpainting.
Uni-paint offers various modes of guidance, including text-driven, stroke-driven, exemplar-driven inpainting.
Our approach achieves comparable results to existing single-modal methods.
arXiv Detail & Related papers (2023-10-11T06:11:42Z) - AI Illustrator: Translating Raw Descriptions into Images by Prompt-based
Cross-Modal Generation [61.77946020543875]
We propose a framework for translating raw descriptions with complex semantics into semantically corresponding images.
Our framework consists of two components: a projection module from Text Embeddings to Image Embeddings based on prompts, and an adapted image generation module built on StyleGAN.
Benefiting from the pre-trained models, our method can handle complex descriptions and does not require external paired data for training.
arXiv Detail & Related papers (2022-09-07T13:53:54Z) - Intelli-Paint: Towards Developing Human-like Painting Agents [19.261822105543175]
We propose a novel painting approach which learns to generate output canvases while exhibiting a more human-like painting style.
Intelli-Paint consists of 1) a progressive layering strategy which allows the agent to first paint a natural background scene representation before adding in each of the foreground objects in a progressive fashion.
We also introduce a novel sequential brushstroke guidance strategy which helps the painting agent to shift its attention between different image regions in a semantic-aware manner.
arXiv Detail & Related papers (2021-12-16T14:56:32Z) - In&Out : Diverse Image Outpainting via GAN Inversion [89.84841983778672]
Image outpainting seeks for a semantically consistent extension of the input image beyond its available content.
In this work, we formulate the problem from the perspective of inverting generative adversarial networks.
Our generator renders micro-patches conditioned on their joint latent code as well as their individual positions in the image.
arXiv Detail & Related papers (2021-04-01T17:59:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.