PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like
  Interactions
        - URL: http://arxiv.org/abs/2308.05184v1
- Date: Wed, 9 Aug 2023 18:41:11 GMT
- Title: PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like
  Interactions
- Authors: John Joon Young Chung, Eytan Adar
- Abstract summary: PromptPaint allows users to mix prompts that express challenging concepts.
We characterize different approaches for mixing prompts, design trade-offs, and socio-technical challenges for generative models.
- Score: 12.792576041526287
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   While diffusion-based text-to-image (T2I) models provide a simple and
powerful way to generate images, guiding this generation remains a challenge.
For concepts that are difficult to describe through language, users may
struggle to create prompts. Moreover, many of these models are built as
end-to-end systems, lacking support for iterative shaping of the image. In
response, we introduce PromptPaint, which combines T2I generation with
interactions that model how we use colored paints. PromptPaint allows users to
go beyond language to mix prompts that express challenging concepts. Just as we
iteratively tune colors through layered placements of paint on a physical
canvas, PromptPaint similarly allows users to apply different prompts to
different canvas areas and times of the generative process. Through a set of
studies, we characterize different approaches for mixing prompts, design
trade-offs, and socio-technical challenges for generative models. With
PromptPaint we provide insight into future steerable generative tools.
 
      
        Related papers
        - RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation   via Reinforcement Learning [88.14234949860105]
 RePrompt is a novel reprompting framework that introduces explicit reasoning into the prompt enhancement process via reinforcement learning.<n>Our approach enables end-to-end training without human-annotated data.
 arXiv  Detail & Related papers  (2025-05-23T06:44:26Z)
- I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt   Generation for Text-Guided Multi-Mask Inpainting [8.94249680213101]
 Inpainting focuses on filling missing or corrupted regions of an image to blend seamlessly with its surrounding content and style.
We introduce the novel task of multi-mask inpainting, where multiple regions are simultaneously inpainted using distinct prompts.
Our pipeline delivers creative and accurate inpainting results.
 arXiv  Detail & Related papers  (2024-11-28T10:55:09Z)
- VIP: Versatile Image Outpainting Empowered by Multimodal Large Language   Model [76.02314305164595]
 This work presents a novel image outpainting framework that is capable of customizing the results according to the requirement of users.
We take advantage of a Multimodal Large Language Model (MLLM) that automatically extracts and organizes the corresponding textual descriptions of the masked and unmasked part of a given image.
In addition, a special Cross-Attention module, namely Center-Total-Surrounding (CTS), is elaborately designed to enhance further the the interaction between specific space regions of the image and corresponding parts of the text prompts.
 arXiv  Detail & Related papers  (2024-06-03T07:14:19Z)
- Automated Black-box Prompt Engineering for Personalized Text-to-Image   Generation [150.57983348059528]
 PRISM is an algorithm that automatically identifies human-interpretable and transferable prompts.
It can effectively generate desired concepts given only black-box access to T2I models.
Our experiments demonstrate the versatility and effectiveness of PRISM in generating accurate prompts for objects, styles and images.
 arXiv  Detail & Related papers  (2024-03-28T02:35:53Z)
- Towards Language-Driven Video Inpainting via Multimodal Large Language   Models [116.22805434658567]
 We introduce a new task -- language-driven video inpainting.
It uses natural language instructions to guide the inpainting process.
We present the Remove Objects from Videos by Instructions dataset.
 arXiv  Detail & Related papers  (2024-01-18T18:59:13Z)
- HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image   Inpainting with Diffusion Models [59.01600111737628]
 HD-Painter is a training free approach that accurately follows prompts and coherently scales to high resolution image inpainting.
To this end, we design the Prompt-Aware Introverted Attention (PAIntA) layer enhancing self-attention scores.
Our experiments demonstrate that HD-Painter surpasses existing state-of-the-art approaches quantitatively and qualitatively.
 arXiv  Detail & Related papers  (2023-12-21T18:09:30Z)
- A Task is Worth One Word: Learning with Task Prompts for High-Quality   Versatile Image Inpainting [38.53807472111521]
 We introduce PowerPaint, the first high-quality and versatile inpainting model that excels in multiple inpainting tasks.
We demonstrate the versatility of the task prompt in PowerPaint by showcasing its effectiveness as a negative prompt for object removal.
We leverage prompt techniques to enable controllable shape-guided object inpainting, enhancing the model's applicability in shape-guided applications.
 arXiv  Detail & Related papers  (2023-12-06T16:34:46Z)
- Uni-paint: A Unified Framework for Multimodal Image Inpainting with
  Pretrained Diffusion Model [19.800236358666123]
 We propose Uni-paint, a unified framework for multimodal inpainting.
Uni-paint offers various modes of guidance, including text-driven, stroke-driven, exemplar-driven inpainting.
Our approach achieves comparable results to existing single-modal methods.
 arXiv  Detail & Related papers  (2023-10-11T06:11:42Z)
- PaintSeg: Training-free Segmentation via Painting [50.17936803209125]
 PaintSeg is a new unsupervised method for segmenting objects without any training.
Inpainting and outpainting are alternated, with the former masking the foreground and filling in the background, and the latter masking the background while recovering the missing part of the foreground object.
Our experimental results demonstrate that PaintSeg outperforms existing approaches in coarse mask-prompt, box-prompt, and point-prompt segmentation tasks.
 arXiv  Detail & Related papers  (2023-05-30T20:43:42Z)
- AI Illustrator: Translating Raw Descriptions into Images by Prompt-based
  Cross-Modal Generation [61.77946020543875]
 We propose a framework for translating raw descriptions with complex semantics into semantically corresponding images.
Our framework consists of two components: a projection module from Text Embeddings to Image Embeddings based on prompts, and an adapted image generation module built on StyleGAN.
Benefiting from the pre-trained models, our method can handle complex descriptions and does not require external paired data for training.
 arXiv  Detail & Related papers  (2022-09-07T13:53:54Z)
- Intelli-Paint: Towards Developing Human-like Painting Agents [19.261822105543175]
 We propose a novel painting approach which learns to generate output canvases while exhibiting a more human-like painting style.
Intelli-Paint consists of 1) a progressive layering strategy which allows the agent to first paint a natural background scene representation before adding in each of the foreground objects in a progressive fashion.
We also introduce a novel sequential brushstroke guidance strategy which helps the painting agent to shift its attention between different image regions in a semantic-aware manner.
 arXiv  Detail & Related papers  (2021-12-16T14:56:32Z)
- In&Out : Diverse Image Outpainting via GAN Inversion [89.84841983778672]
 Image outpainting seeks for a semantically consistent extension of the input image beyond its available content.
In this work, we formulate the problem from the perspective of inverting generative adversarial networks.
Our generator renders micro-patches conditioned on their joint latent code as well as their individual positions in the image.
 arXiv  Detail & Related papers  (2021-04-01T17:59:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.