Paint by Inpaint: Learning to Add Image Objects by Removing Them First
- URL: http://arxiv.org/abs/2404.18212v1
- Date: Sun, 28 Apr 2024 15:07:53 GMT
- Title: Paint by Inpaint: Learning to Add Image Objects by Removing Them First
- Authors: Navve Wasserman, Noam Rotstein, Roy Ganz, Ron Kimmel,
- Abstract summary: We train a diffusion model to inverse the inpainting process, effectively adding objects into images.
We provide detailed descriptions of the removed objects and a Large Language Model to convert these descriptions into diverse, natural-language instructions.
- Score: 8.399234415641319
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image editing has advanced significantly with the introduction of text-conditioned diffusion models. Despite this progress, seamlessly adding objects to images based on textual instructions without requiring user-provided input masks remains a challenge. We address this by leveraging the insight that removing objects (Inpaint) is significantly simpler than its inverse process of adding them (Paint), attributed to the utilization of segmentation mask datasets alongside inpainting models that inpaint within these masks. Capitalizing on this realization, by implementing an automated and extensive pipeline, we curate a filtered large-scale image dataset containing pairs of images and their corresponding object-removed versions. Using these pairs, we train a diffusion model to inverse the inpainting process, effectively adding objects into images. Unlike other editing datasets, ours features natural target images instead of synthetic ones; moreover, it maintains consistency between source and target by construction. Additionally, we utilize a large Vision-Language Model to provide detailed descriptions of the removed objects and a Large Language Model to convert these descriptions into diverse, natural-language instructions. We show that the trained model surpasses existing ones both qualitatively and quantitatively, and release the large-scale dataset alongside the trained models for the community.
Related papers
- SmartEraser: Remove Anything from Images using Masked-Region Guidance [114.36809682798784]
SmartEraser is built with a new removing paradigm called Masked-Region Guidance.
Masked-Region Guidance retains the masked region in the input, using it as guidance for the removal process.
We present Syn4Removal, a large-scale object removal dataset.
arXiv Detail & Related papers (2025-01-14T17:55:12Z) - BrushEdit: All-In-One Image Inpainting and Editing [79.55816192146762]
BrushEdit is a novel inpainting-based instruction-guided image editing paradigm.
We devise a system enabling free-form instruction editing by integrating MLLMs and a dual-branch image inpainting model.
Our framework effectively combines MLLMs and inpainting models, achieving superior performance across seven metrics.
arXiv Detail & Related papers (2024-12-13T17:58:06Z) - Improving Text-guided Object Inpainting with Semantic Pre-inpainting [95.17396565347936]
We decompose the typical single-stage object inpainting into two cascaded processes: semantic pre-inpainting and high-fieldity object generation.
To achieve this, we cascade a Transformer-based semantic inpainter and an object inpainting diffusion model, leading to a novel CAscaded Transformer-Diffusion framework.
arXiv Detail & Related papers (2024-09-12T17:55:37Z) - Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model [81.96954332787655]
We introduce Diffree, a Text-to-Image (T2I) model that facilitates text-guided object addition with only text control.
In experiments, Diffree adds new objects with a high success rate while maintaining background consistency, spatial, and object relevance and quality.
arXiv Detail & Related papers (2024-07-24T03:58:58Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - Outline-Guided Object Inpainting with Diffusion Models [11.391452115311798]
Instance segmentation datasets play a crucial role in training accurate and robust computer vision models.
We show how this issue can be mitigated by starting with small annotated instance segmentation datasets and augmenting them to obtain a sizeable annotated dataset.
We generate new images using a diffusion-based inpainting model to fill out the masked area with a desired object class by guiding the diffusion through the object outline.
arXiv Detail & Related papers (2024-02-26T09:21:17Z) - Towards Language-Driven Video Inpainting via Multimodal Large Language Models [116.22805434658567]
We introduce a new task -- language-driven video inpainting.
It uses natural language instructions to guide the inpainting process.
We present the Remove Objects from Videos by Instructions dataset.
arXiv Detail & Related papers (2024-01-18T18:59:13Z) - Unlocking Spatial Comprehension in Text-to-Image Diffusion Models [33.99474729408903]
CompFuser is an image generation pipeline that enhances spatial comprehension and attribute assignment in text-to-image generative models.
Our pipeline enables the interpretation of instructions defining spatial relationships between objects in a scene.
arXiv Detail & Related papers (2023-11-28T19:00:02Z) - Magicremover: Tuning-free Text-guided Image inpainting with Diffusion
Models [24.690863845885367]
We propose MagicRemover, a tuning-free method that leverages the powerful diffusion models for text-guided image inpainting.
We introduce an attention guidance strategy to constrain the sampling process of diffusion models, enabling the erasing of instructed areas and the restoration of occluded content.
arXiv Detail & Related papers (2023-10-04T14:34:11Z) - Inst-Inpaint: Instructing to Remove Objects with Diffusion Models [18.30057229657246]
In this work, we are interested in an image inpainting algorithm that estimates which object to be removed based on natural language input and removes it, simultaneously.
We present a novel inpainting framework, Inst-Inpaint, that can remove objects from images based on the instructions given as text prompts.
arXiv Detail & Related papers (2023-04-06T17:29:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.