Related papers: PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like Interactions

PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like Interactions

URL: http://arxiv.org/abs/2308.05184v1
Date: Wed, 9 Aug 2023 18:41:11 GMT
Title: PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like Interactions
Authors: John Joon Young Chung, Eytan Adar
Abstract summary: PromptPaint allows users to mix prompts that express challenging concepts. We characterize different approaches for mixing prompts, design trade-offs, and socio-technical challenges for generative models.
Score: 12.792576041526287
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While diffusion-based text-to-image (T2I) models provide a simple and powerful way to generate images, guiding this generation remains a challenge. For concepts that are difficult to describe through language, users may struggle to create prompts. Moreover, many of these models are built as end-to-end systems, lacking support for iterative shaping of the image. In response, we introduce PromptPaint, which combines T2I generation with interactions that model how we use colored paints. PromptPaint allows users to go beyond language to mix prompts that express challenging concepts. Just as we iteratively tune colors through layered placements of paint on a physical canvas, PromptPaint similarly allows users to apply different prompts to different canvas areas and times of the generative process. Through a set of studies, we characterize different approaches for mixing prompts, design trade-offs, and socio-technical challenges for generative models. With PromptPaint we provide insight into future steerable generative tools.

Related papers

RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning [88.14234949860105]
RePrompt is a novel reprompting framework that introduces explicit reasoning into the prompt enhancement process via reinforcement learning.<n>Our approach enables end-to-end training without human-annotated data.
arXiv Detail & Related papers (2025-05-23T06:44:26Z)
I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting [8.94249680213101]
Inpainting focuses on filling missing or corrupted regions of an image to blend seamlessly with its surrounding content and style. We introduce the novel task of multi-mask inpainting, where multiple regions are simultaneously inpainted using distinct prompts. Our pipeline delivers creative and accurate inpainting results.
arXiv Detail & Related papers (2024-11-28T10:55:09Z)
VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model [76.02314305164595]
This work presents a novel image outpainting framework that is capable of customizing the results according to the requirement of users. We take advantage of a Multimodal Large Language Model (MLLM) that automatically extracts and organizes the corresponding textual descriptions of the masked and unmasked part of a given image. In addition, a special Cross-Attention module, namely Center-Total-Surrounding (CTS), is elaborately designed to enhance further the the interaction between specific space regions of the image and corresponding parts of the text prompts.
arXiv Detail & Related papers (2024-06-03T07:14:19Z)
Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation [150.57983348059528]
PRISM is an algorithm that automatically identifies human-interpretable and transferable prompts. It can effectively generate desired concepts given only black-box access to T2I models. Our experiments demonstrate the versatility and effectiveness of PRISM in generating accurate prompts for objects, styles and images.
arXiv Detail & Related papers (2024-03-28T02:35:53Z)
Towards Language-Driven Video Inpainting via Multimodal Large Language Models [116.22805434658567]
We introduce a new task -- language-driven video inpainting. It uses natural language instructions to guide the inpainting process. We present the Remove Objects from Videos by Instructions dataset.
arXiv Detail & Related papers (2024-01-18T18:59:13Z)
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models [59.01600111737628]
HD-Painter is a training free approach that accurately follows prompts and coherently scales to high resolution image inpainting. To this end, we design the Prompt-Aware Introverted Attention (PAIntA) layer enhancing self-attention scores. Our experiments demonstrate that HD-Painter surpasses existing state-of-the-art approaches quantitatively and qualitatively.
arXiv Detail & Related papers (2023-12-21T18:09:30Z)
A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting [38.53807472111521]
We introduce PowerPaint, the first high-quality and versatile inpainting model that excels in multiple inpainting tasks. We demonstrate the versatility of the task prompt in PowerPaint by showcasing its effectiveness as a negative prompt for object removal. We leverage prompt techniques to enable controllable shape-guided object inpainting, enhancing the model's applicability in shape-guided applications.
arXiv Detail & Related papers (2023-12-06T16:34:46Z)
Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model [19.800236358666123]
We propose Uni-paint, a unified framework for multimodal inpainting. Uni-paint offers various modes of guidance, including text-driven, stroke-driven, exemplar-driven inpainting. Our approach achieves comparable results to existing single-modal methods.
arXiv Detail & Related papers (2023-10-11T06:11:42Z)
PaintSeg: Training-free Segmentation via Painting [50.17936803209125]
PaintSeg is a new unsupervised method for segmenting objects without any training. Inpainting and outpainting are alternated, with the former masking the foreground and filling in the background, and the latter masking the background while recovering the missing part of the foreground object. Our experimental results demonstrate that PaintSeg outperforms existing approaches in coarse mask-prompt, box-prompt, and point-prompt segmentation tasks.
arXiv Detail & Related papers (2023-05-30T20:43:42Z)
AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation [61.77946020543875]
We propose a framework for translating raw descriptions with complex semantics into semantically corresponding images. Our framework consists of two components: a projection module from Text Embeddings to Image Embeddings based on prompts, and an adapted image generation module built on StyleGAN. Benefiting from the pre-trained models, our method can handle complex descriptions and does not require external paired data for training.
arXiv Detail & Related papers (2022-09-07T13:53:54Z)
Intelli-Paint: Towards Developing Human-like Painting Agents [19.261822105543175]
We propose a novel painting approach which learns to generate output canvases while exhibiting a more human-like painting style. Intelli-Paint consists of 1) a progressive layering strategy which allows the agent to first paint a natural background scene representation before adding in each of the foreground objects in a progressive fashion. We also introduce a novel sequential brushstroke guidance strategy which helps the painting agent to shift its attention between different image regions in a semantic-aware manner.
arXiv Detail & Related papers (2021-12-16T14:56:32Z)
In&Out : Diverse Image Outpainting via GAN Inversion [89.84841983778672]
Image outpainting seeks for a semantically consistent extension of the input image beyond its available content. In this work, we formulate the problem from the perspective of inverting generative adversarial networks. Our generator renders micro-patches conditioned on their joint latent code as well as their individual positions in the image.
arXiv Detail & Related papers (2021-04-01T17:59:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.