I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting
- URL: http://arxiv.org/abs/2411.19050v2
- Date: Fri, 06 Dec 2024 10:58:53 GMT
- Title: I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting
- Authors: Nicola Fanelli, Gennaro Vessio, Giovanna Castellano,
- Abstract summary: Inpainting focuses on filling missing or corrupted regions of an image to blend seamlessly with its surrounding content and style.
We introduce the novel task of multi-mask inpainting, where multiple regions are simultaneously inpainted using distinct prompts.
Our pipeline delivers creative and accurate inpainting results.
- Score: 8.94249680213101
- License:
- Abstract: Inpainting focuses on filling missing or corrupted regions of an image to blend seamlessly with its surrounding content and style. While conditional diffusion models have proven effective for text-guided inpainting, we introduce the novel task of multi-mask inpainting, where multiple regions are simultaneously inpainted using distinct prompts. Furthermore, we design a fine-tuning procedure for multimodal LLMs, such as LLaVA, to generate multi-mask prompts automatically using corrupted images as inputs. These models can generate helpful and detailed prompt suggestions for filling the masked regions. The generated prompts are then fed to Stable Diffusion, which is fine-tuned for the multi-mask inpainting problem using rectified cross-attention, enforcing prompts onto their designated regions for filling. Experiments on digitized paintings from WikiArt and the Densely Captioned Images dataset demonstrate that our pipeline delivers creative and accurate inpainting results. Our code, data, and trained models are available at https://cilabuniba.github.io/i-dream-my-painting.
Related papers
- PainterNet: Adaptive Image Inpainting with Actual-Token Attention and Diverse Mask Control [4.984382582612786]
PainterNet is a plugin that can be flexibly embedded into various diffusion models.
We propose local prompt input, Attention Control Points (ACP), and Actual-Token Attention Loss (ATAL) to enhance the model's focus on local areas.
Our extensive experimental analysis exhibits that PainterNet surpasses existing state-of-the-art models in key metrics including image quality and global/local text consistency.
arXiv Detail & Related papers (2024-12-02T07:40:47Z) - DiffSTR: Controlled Diffusion Models for Scene Text Removal [5.790630195329777]
Scene Text Removal (STR) aims to prevent unauthorized use of text in images.
STR faces several challenges, including boundary artifacts, inconsistent texture and color, and preserving correct shadows.
We introduce a ControlNet diffusion model, treating STR as an inpainting task.
We develop a mask pretraining pipeline to condition our diffusion model.
arXiv Detail & Related papers (2024-10-29T04:20:21Z) - PaintSeg: Training-free Segmentation via Painting [50.17936803209125]
PaintSeg is a new unsupervised method for segmenting objects without any training.
Inpainting and outpainting are alternated, with the former masking the foreground and filling in the background, and the latter masking the background while recovering the missing part of the foreground object.
Our experimental results demonstrate that PaintSeg outperforms existing approaches in coarse mask-prompt, box-prompt, and point-prompt segmentation tasks.
arXiv Detail & Related papers (2023-05-30T20:43:42Z) - DiffEdit: Diffusion-based semantic image editing with mask guidance [64.555930158319]
DiffEdit is a method to take advantage of text-conditioned diffusion models for the task of semantic image editing.
Our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited.
arXiv Detail & Related papers (2022-10-20T17:16:37Z) - RePaint: Inpainting using Denoising Diffusion Probabilistic Models [161.74792336127345]
Free-form inpainting is the task of adding new content to an image in the regions specified by an arbitrary binary mask.
We propose RePaint: A Denoising Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks.
We validate our method for both faces and general-purpose image inpainting using standard and extreme masks.
arXiv Detail & Related papers (2022-01-24T18:40:15Z) - Learning Sparse Masks for Diffusion-based Image Inpainting [10.633099921979674]
Diffusion-based inpainting is a powerful tool for the reconstruction of images from sparse data.
We provide a model for highly efficient adaptive mask generation.
Experiments indicate that our model can achieve competitive quality with an acceleration by as much as four orders of magnitude.
arXiv Detail & Related papers (2021-10-06T10:20:59Z) - In&Out : Diverse Image Outpainting via GAN Inversion [89.84841983778672]
Image outpainting seeks for a semantically consistent extension of the input image beyond its available content.
In this work, we formulate the problem from the perspective of inverting generative adversarial networks.
Our generator renders micro-patches conditioned on their joint latent code as well as their individual positions in the image.
arXiv Detail & Related papers (2021-04-01T17:59:10Z) - Free-Form Image Inpainting via Contrastive Attention Network [64.05544199212831]
In image inpainting tasks, masks with any shapes can appear anywhere in images which form complex patterns.
It is difficult for encoders to capture such powerful representations under this complex situation.
We propose a self-supervised Siamese inference network to improve the robustness and generalization.
arXiv Detail & Related papers (2020-10-29T14:46:05Z) - VCNet: A Robust Approach to Blind Image Inpainting [70.68227719731243]
Blind inpainting is a task to automatically complete visual contents without specifying masks for missing areas in an image.
In this paper, we define a new blind inpainting setting, making training a blind inpainting neural system robust against unknown missing region patterns.
Our method is effective and robust in blind image inpainting. And our VCN allows for a wide spectrum of applications.
arXiv Detail & Related papers (2020-03-15T12:47:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.