Related papers: Towards Enhanced Image Inpainting: Mitigating Unwanted Object Insertion and Preserving Color Consistency

Towards Enhanced Image Inpainting: Mitigating Unwanted Object Insertion and Preserving Color Consistency

URL: http://arxiv.org/abs/2312.04831v3
Date: Sun, 18 May 2025 05:51:22 GMT
Title: Towards Enhanced Image Inpainting: Mitigating Unwanted Object Insertion and Preserving Color Consistency
Authors: Yikai Wang, Chenjie Cao, Junqiu Yu, Ke Fan, Xiangyang Xue, Yanwei Fu,
Abstract summary: Post-processing approach dubbed ASUKA (Aligned Stable inpainting with UnKnown Areas prior) to improve inpainting models.<n>Masked Auto-Encoder (MAE) for reconstruction-based priors mitigates object hallucination.<n> specialized VAE decoder that treats latent-to-image decoding as a local task.
Score: 78.0488707697235
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in image inpainting increasingly use generative models to handle large irregular masks. However, these models can create unrealistic inpainted images due to two main issues: (1) Unwanted object insertion: Even with unmasked areas as context, generative models may still generate arbitrary objects in the masked region that don't align with the rest of the image. (2) Color inconsistency: Inpainted regions often have color shifts that causes a smeared appearance, reducing image quality. Retraining the generative model could help solve these issues, but it's costly since state-of-the-art latent-based diffusion and rectified flow models require a three-stage training process: training a VAE, training a generative U-Net or transformer, and fine-tuning for inpainting. Instead, this paper proposes a post-processing approach, dubbed as ASUKA (Aligned Stable inpainting with UnKnown Areas prior), to improve inpainting models. To address unwanted object insertion, we leverage a Masked Auto-Encoder (MAE) for reconstruction-based priors. This mitigates object hallucination while maintaining the model's generation capabilities. To address color inconsistency, we propose a specialized VAE decoder that treats latent-to-image decoding as a local harmonization task, significantly reducing color shifts for color-consistent inpainting. We validate ASUKA on SD 1.5 and FLUX inpainting variants with Places2 and MISATO, our proposed diverse collection of datasets. Results show that ASUKA mitigates object hallucination and improves color consistency over standard diffusion and rectified flow models and other inpainting methods.

Related papers

OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting [54.525583840585305]
We introduce OmniPaint, a unified framework that re-conceptualizes object removal and insertion as interdependent processes.<n>Our novel CFD metric offers a robust, reference-free evaluation of context consistency and object hallucination.
arXiv Detail & Related papers (2025-03-11T17:55:27Z)
Mask Factory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation [70.95380821618711]
Dichotomous Image (DIS) tasks require highly precise annotations. Current generative models and techniques struggle with the issues of scene deviations, noise-induced errors, and limited training sample variability. We introduce a novel approach, which provides a scalable solution for generating diverse and precise datasets.
arXiv Detail & Related papers (2024-12-26T06:37:25Z)
SEM-Net: Efficient Pixel Modelling for image inpainting with Spatially Enhanced SSM [11.447968918063335]
Image inpainting aims to repair a partially damaged image based on the information from known regions of the images. SEM-Net is a novel visual State Space model (SSM) vision network, modelling corrupted images at the pixel level while capturing long-range dependencies (LRDs) in state space.
arXiv Detail & Related papers (2024-11-10T00:35:14Z)
Improving Text-guided Object Inpainting with Semantic Pre-inpainting [95.17396565347936]
We decompose the typical single-stage object inpainting into two cascaded processes: semantic pre-inpainting and high-fieldity object generation. To achieve this, we cascade a Transformer-based semantic inpainter and an object inpainting diffusion model, leading to a novel CAscaded Transformer-Diffusion framework.
arXiv Detail & Related papers (2024-09-12T17:55:37Z)
ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework. We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise. We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z)
Paint by Inpaint: Learning to Add Image Objects by Removing Them First [8.399234415641319]
We train a diffusion model to inverse the inpainting process, effectively adding objects into images.<n>Our results show that the trained model surpasses existing models in both object addition and general editing tasks.
arXiv Detail & Related papers (2024-04-28T15:07:53Z)
UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders. We first develop an adaptive feature mask generator to account for the unique significance of nodes. We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z)
GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds [72.60362979456035]
Masked Autoencoders (MAE) are challenging to explore in large-scale 3D point clouds. We propose a textbfGenerative textbfDecoder for MAE (GD-MAE) to automatically merges the surrounding context. We demonstrate the efficacy of the proposed method on several large-scale benchmarks: KITTI, and ONCE.
arXiv Detail & Related papers (2022-12-06T14:32:55Z)
MixMask: Revisiting Masking Strategy for Siamese ConvNets [23.946791390657875]
This work introduces a novel filling-based masking approach, termed textbfMixMask. The proposed method replaces erased areas with content from a different image, effectively countering the information depletion seen in traditional masking methods. We empirically validate our framework's enhanced performance in areas such as linear probing, semi-supervised and supervised finetuning, object detection and segmentation.
arXiv Detail & Related papers (2022-10-20T17:54:03Z)
Perceptual Artifacts Localization for Inpainting [60.5659086595901]
We propose a new learning task of automatic segmentation of inpainting perceptual artifacts. We train advanced segmentation networks on a dataset to reliably localize inpainting artifacts within inpainted images. We also propose a new evaluation metric called Perceptual Artifact Ratio (PAR), which is the ratio of objectionable inpainted regions to the entire inpainted area.
arXiv Detail & Related papers (2022-08-05T18:50:51Z)
Learning Prior Feature and Attention Enhanced Image Inpainting [63.21231753407192]
This paper incorporates the pre-training based Masked AutoEncoder (MAE) into the inpainting model. We propose to use attention priors from MAE to make the inpainting model learn more long-distance dependencies between masked and unmasked regions.
arXiv Detail & Related papers (2022-08-03T04:32:53Z)
RePaint: Inpainting using Denoising Diffusion Probabilistic Models [161.74792336127345]
Free-form inpainting is the task of adding new content to an image in the regions specified by an arbitrary binary mask. We propose RePaint: A Denoising Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks. We validate our method for both faces and general-purpose image inpainting using standard and extreme masks.
arXiv Detail & Related papers (2022-01-24T18:40:15Z)
Image Inpainting by End-to-End Cascaded Refinement with Mask Awareness [66.55719330810547]
Inpainting arbitrary missing regions is challenging because learning valid features for various masked regions is nontrivial. We propose a novel mask-aware inpainting solution that learns multi-scale features for missing regions in the encoding phase. Our framework is validated both quantitatively and qualitatively via extensive experiments on three public datasets.
arXiv Detail & Related papers (2021-04-28T13:17:47Z)
Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields. To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss. We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
Very Long Natural Scenery Image Prediction by Outpainting [96.8509015981031]
Outpainting receives less attention due to two challenges in it. First challenge is how to keep the spatial and content consistency between generated images and original input. Second challenge is how to maintain high quality in generated results.
arXiv Detail & Related papers (2019-12-29T16:29:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.