BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed
Dual-Branch Diffusion
- URL: http://arxiv.org/abs/2403.06976v1
- Date: Mon, 11 Mar 2024 17:59:31 GMT
- Title: BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed
Dual-Branch Diffusion
- Authors: Xuan Ju, Xian Liu, Xintao Wang, Yuxuan Bian, Ying Shan, Qiang Xu
- Abstract summary: BrushNet is a novel plug-and-play dual-branch model engineered to embed pixel-level masked image features into any pre-trained DM.
BrushNet's superior performance over existing models across seven key metrics, including image quality, mask region preservation, and textual coherence.
- Score: 61.90969199199739
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image inpainting, the process of restoring corrupted images, has seen
significant advancements with the advent of diffusion models (DMs). Despite
these advancements, current DM adaptations for inpainting, which involve
modifications to the sampling strategy or the development of
inpainting-specific DMs, frequently suffer from semantic inconsistencies and
reduced image quality. Addressing these challenges, our work introduces a novel
paradigm: the division of masked image features and noisy latent into separate
branches. This division dramatically diminishes the model's learning load,
facilitating a nuanced incorporation of essential masked image information in a
hierarchical fashion. Herein, we present BrushNet, a novel plug-and-play
dual-branch model engineered to embed pixel-level masked image features into
any pre-trained DM, guaranteeing coherent and enhanced image inpainting
outcomes. Additionally, we introduce BrushData and BrushBench to facilitate
segmentation-based inpainting training and performance assessment. Our
extensive experimental analysis demonstrates BrushNet's superior performance
over existing models across seven key metrics, including image quality, mask
region preservation, and textual coherence.
Related papers
- Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting [0.17975553762582286]
Current image stitching methods produce noticeable seams in challenging scenarios such as uneven hue and large parallax.
We propose the Reference-Driven Inpainting Stitcher (RDIStitcher) to reformulate the image fusion and rectangling as a reference-based inpainting model.
We present the Multimodal Large Language Models (MLLMs)-based metrics, offering a new perspective on evaluating stitched image quality.
arXiv Detail & Related papers (2024-11-15T16:05:01Z) - MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling [64.09238330331195]
We propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework.
Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss.
We show that MMAR demonstrates much more superior performance than other joint multi-modal models.
arXiv Detail & Related papers (2024-10-14T17:57:18Z) - Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models [69.50286698375386]
We propose a novel approach that better harnesses diffusion models for face-swapping.
We introduce a mask shuffling technique during inpainting training, which allows us to create a so-called universal model for swapping.
Ours is a relatively unified approach and so it is resilient to errors in other off-the-shelf models.
arXiv Detail & Related papers (2024-09-11T13:43:53Z) - MRIR: Integrating Multimodal Insights for Diffusion-based Realistic Image Restoration [17.47612023350466]
We propose MRIR, a diffusion-based restoration method with multimodal insights.
For the textual level, we harness the power of the pre-trained multimodal large language model to infer meaningful semantic information from low-quality images.
For the visual level, we mainly focus on the pixel level control. Thus, we utilize a Pixel-level Processor and ControlNet to control spatial structures.
arXiv Detail & Related papers (2024-07-04T04:55:14Z) - VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model [76.02314305164595]
This work presents a novel image outpainting framework that is capable of customizing the results according to the requirement of users.
We take advantage of a Multimodal Large Language Model (MLLM) that automatically extracts and organizes the corresponding textual descriptions of the masked and unmasked part of a given image.
In addition, a special Cross-Attention module, namely Center-Total-Surrounding (CTS), is elaborately designed to enhance further the the interaction between specific space regions of the image and corresponding parts of the text prompts.
arXiv Detail & Related papers (2024-06-03T07:14:19Z) - Sketch-guided Image Inpainting with Partial Discrete Diffusion Process [5.005162730122933]
We introduce a novel partial discrete diffusion process (PDDP) for sketch-guided inpainting.
PDDP corrupts the masked regions of the image and reconstructs these masked regions conditioned on hand-drawn sketches.
The proposed novel transformer module accepts two inputs -- the image containing the masked region to be inpainted and the query sketch to model the reverse diffusion process.
arXiv Detail & Related papers (2024-04-18T07:07:38Z) - Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
Latent Diffusion [50.59261592343479]
We present Kandinsky1, a novel exploration of latent diffusion architecture.
The proposed model is trained separately to map text embeddings to image embeddings of CLIP.
We also deployed a user-friendly demo system that supports diverse generative modes such as text-to-image generation, image fusion, text and image fusion, image variations generation, and text-guided inpainting/outpainting.
arXiv Detail & Related papers (2023-10-05T12:29:41Z) - GRIG: Few-Shot Generative Residual Image Inpainting [27.252855062283825]
We present a novel few-shot generative residual image inpainting method that produces high-quality inpainting results.
The core idea is to propose an iterative residual reasoning method that incorporates Convolutional Neural Networks (CNNs) for feature extraction.
We also propose a novel forgery-patch adversarial training strategy to create faithful textures and detailed appearances.
arXiv Detail & Related papers (2023-04-24T12:19:06Z) - MAT: Mask-Aware Transformer for Large Hole Image Inpainting [79.67039090195527]
We present a novel model for large hole inpainting, which unifies the merits of transformers and convolutions.
Experiments demonstrate the state-of-the-art performance of the new model on multiple benchmark datasets.
arXiv Detail & Related papers (2022-03-29T06:36:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.