Related papers: Mask Consistency Regularization in Object Removal

Mask Consistency Regularization in Object Removal

URL: http://arxiv.org/abs/2509.10259v1
Date: Fri, 12 Sep 2025 14:02:52 GMT
Title: Mask Consistency Regularization in Object Removal
Authors: Hua Yuan, Jin Yuan, Yicheng Jiang, Yao Zhang, Xin Geng, Yong Rui,
Abstract summary: Mask Consistency Regularization (MCR) is a novel training strategy designed specifically for object removal tasks.<n>MCR significantly reduces hallucinations and mask-shape bias, leading to improved performance in object removal.
Score: 43.90240963122134
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Object removal, a challenging task within image inpainting, involves seamlessly filling the removed region with content that matches the surrounding context. Despite advancements in diffusion models, current methods still face two critical challenges. The first is mask hallucination, where the model generates irrelevant or spurious content inside the masked region, and the second is mask-shape bias, where the model fills the masked area with an object that mimics the mask's shape rather than surrounding content. To address these issues, we propose Mask Consistency Regularization (MCR), a novel training strategy designed specifically for object removal tasks. During training, our approach introduces two mask perturbations: dilation and reshape, enforcing consistency between the outputs of these perturbed branches and the original mask. The dilated masks help align the model's output with the surrounding content, while reshaped masks encourage the model to break the mask-shape bias. This combination of strategies enables MCR to produce more robust and contextually coherent inpainting results. Our experiments demonstrate that MCR significantly reduces hallucinations and mask-shape bias, leading to improved performance in object removal.

Related papers

MaskAnyNet: Rethinking Masked Image Regions as Valuable Information in Supervised Learning [7.222969785370652]
MaskAnyNet combines masking with a relearning mechanism to exploit both visible and masked information.<n> Experiments on CNN and Transformer backbones show consistent gains across multiple benchmarks.
arXiv Detail & Related papers (2025-11-16T07:11:33Z)
SmartEraser: Remove Anything from Images using Masked-Region Guidance [114.36809682798784]
SmartEraser is built with a new removing paradigm called Masked-Region Guidance.<n>Masked-Region Guidance retains the masked region in the input, using it as guidance for the removal process.<n>We present Syn4Removal, a large-scale object removal dataset.
arXiv Detail & Related papers (2025-01-14T17:55:12Z)
ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework. We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise. We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z)
MP-Former: Mask-Piloted Transformer for Image Segmentation [16.620469868310288]
Mask2Former suffers from inconsistent mask predictions between decoder layers. We propose a mask-piloted training approach, which feeds noised ground-truth masks in masked-attention and trains the model to reconstruct the original ones.
arXiv Detail & Related papers (2023-03-13T17:57:59Z)
MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency [120.9499803967496]
We propose a novel informative-preserved reconstruction, which explores local statistics to discover and preserve the representative structured points. Our method can concentrate on modeling regional geometry and enjoy less ambiguity for masked reconstruction. By combining informative-preserved reconstruction on masked areas and consistency self-distillation from unmasked areas, a unified framework called MM-3DScene is yielded.
arXiv Detail & Related papers (2022-12-20T01:53:40Z)
Towards Improved Input Masking for Convolutional Neural Networks [66.99060157800403]
We propose a new masking method for CNNs we call layer masking. We show that our method is able to eliminate or minimize the influence of the mask shape or color on the output of the model. We also demonstrate how the shape of the mask may leak information about the class, thus affecting estimates of model reliance on class-relevant features.
arXiv Detail & Related papers (2022-11-26T19:31:49Z)
MixMask: Revisiting Masking Strategy for Siamese ConvNets [23.946791390657875]
This work introduces a novel filling-based masking approach, termed textbfMixMask. The proposed method replaces erased areas with content from a different image, effectively countering the information depletion seen in traditional masking methods. We empirically validate our framework's enhanced performance in areas such as linear probing, semi-supervised and supervised finetuning, object detection and segmentation.
arXiv Detail & Related papers (2022-10-20T17:54:03Z)
Masked Face Inpainting Through Residual Attention UNet [0.7868449549351486]
This paper proposes a blind mask face inpainting method using residual attention UNet. A residual block feeds info to the next layer and directly into the layers about two hops away to solve the vanishing gradient problem. Experiments on the publicly available CelebA dataset show the feasibility and robustness of our proposed model.
arXiv Detail & Related papers (2022-09-19T08:49:53Z)
A Unified Framework for Masked and Mask-Free Face Recognition via Feature Rectification [19.417191498842044]
We propose a unified framework, named Face Feature Rectification Network (FFR-Net), for recognizing both masked and mask-free faces alike. We introduce rectification blocks to rectify features extracted by a state-of-the-art recognition model, in both spatial and channel dimensions. Experiments show that our framework can learn a rectified feature space for recognizing both masked and mask-free faces effectively.
arXiv Detail & Related papers (2022-02-15T12:37:59Z)
Image Inpainting by End-to-End Cascaded Refinement with Mask Awareness [66.55719330810547]
Inpainting arbitrary missing regions is challenging because learning valid features for various masked regions is nontrivial. We propose a novel mask-aware inpainting solution that learns multi-scale features for missing regions in the encoding phase. Our framework is validated both quantitatively and qualitatively via extensive experiments on three public datasets.
arXiv Detail & Related papers (2021-04-28T13:17:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.