SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model
- URL: http://arxiv.org/abs/2212.05034v1
- Date: Fri, 9 Dec 2022 18:36:13 GMT
- Title: SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model
- Authors: Shaoan Xie, Zhifei Zhang, Zhe Lin, Tobias Hinz and Kun Zhang
- Abstract summary: Generic image inpainting aims to complete a corrupted image by borrowing surrounding information.
By contrast, multi-modal inpainting provides more flexible and useful controls on the inpainted content.
We propose a new diffusion-based model named SmartBrush for completing a missing region with an object using both text and shape-guidance.
- Score: 27.91089554671927
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generic image inpainting aims to complete a corrupted image by borrowing
surrounding information, which barely generates novel content. By contrast,
multi-modal inpainting provides more flexible and useful controls on the
inpainted content, \eg, a text prompt can be used to describe an object with
richer attributes, and a mask can be used to constrain the shape of the
inpainted object rather than being only considered as a missing area. We
propose a new diffusion-based model named SmartBrush for completing a missing
region with an object using both text and shape-guidance. While previous work
such as DALLE-2 and Stable Diffusion can do text-guided inapinting they do not
support shape guidance and tend to modify background texture surrounding the
generated object. Our model incorporates both text and shape guidance with
precision control. To preserve the background better, we propose a novel
training and sampling strategy by augmenting the diffusion U-net with
object-mask prediction. Lastly, we introduce a multi-task training strategy by
jointly training inpainting with text-to-image generation to leverage more
training data. We conduct extensive experiments showing that our model
outperforms all baselines in terms of visual quality, mask controllability, and
background preservation.
Related papers
- DiffSTR: Controlled Diffusion Models for Scene Text Removal [5.790630195329777]
Scene Text Removal (STR) aims to prevent unauthorized use of text in images.
STR faces several challenges, including boundary artifacts, inconsistent texture and color, and preserving correct shadows.
We introduce a ControlNet diffusion model, treating STR as an inpainting task.
We develop a mask pretraining pipeline to condition our diffusion model.
arXiv Detail & Related papers (2024-10-29T04:20:21Z) - Improving Text-guided Object Inpainting with Semantic Pre-inpainting [95.17396565347936]
We decompose the typical single-stage object inpainting into two cascaded processes: semantic pre-inpainting and high-fieldity object generation.
To achieve this, we cascade a Transformer-based semantic inpainter and an object inpainting diffusion model, leading to a novel CAscaded Transformer-Diffusion framework.
arXiv Detail & Related papers (2024-09-12T17:55:37Z) - Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model [81.96954332787655]
We introduce Diffree, a Text-to-Image (T2I) model that facilitates text-guided object addition with only text control.
In experiments, Diffree adds new objects with a high success rate while maintaining background consistency, spatial, and object relevance and quality.
arXiv Detail & Related papers (2024-07-24T03:58:58Z) - Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis [63.757624792753205]
We present Zero-Painter, a framework for layout-conditional text-to-image synthesis.
Our method utilizes object masks and individual descriptions, coupled with a global text prompt, to generate images with high fidelity.
arXiv Detail & Related papers (2024-06-06T13:02:00Z) - Outline-Guided Object Inpainting with Diffusion Models [11.391452115311798]
Instance segmentation datasets play a crucial role in training accurate and robust computer vision models.
We show how this issue can be mitigated by starting with small annotated instance segmentation datasets and augmenting them to obtain a sizeable annotated dataset.
We generate new images using a diffusion-based inpainting model to fill out the masked area with a desired object class by guiding the diffusion through the object outline.
arXiv Detail & Related papers (2024-02-26T09:21:17Z) - LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image
Generation [121.45667242282721]
We propose a coarse-to-fine paradigm to achieve layout planning and image generation.
Our proposed method outperforms the state-of-the-art models in terms of photorealistic layout and image generation.
arXiv Detail & Related papers (2023-08-09T17:45:04Z) - Zero-shot spatial layout conditioning for text-to-image diffusion models [52.24744018240424]
Large-scale text-to-image diffusion models have significantly improved the state of the art in generative image modelling.
We consider image generation from text associated with segments on the image canvas, which combines an intuitive natural language interface with precise spatial control over the generated content.
We propose ZestGuide, a zero-shot segmentation guidance approach that can be plugged into pre-trained text-to-image diffusion models.
arXiv Detail & Related papers (2023-06-23T19:24:48Z) - SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control.
In addition to a global text prompt that describes the entire scene, the user provides a segmentation map.
We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.