Related papers: Mask Guided Gated Convolution for Amodal Content Completion

Mask Guided Gated Convolution for Amodal Content Completion

URL: http://arxiv.org/abs/2407.15203v1
Date: Sun, 21 Jul 2024 15:51:29 GMT
Title: Mask Guided Gated Convolution for Amodal Content Completion
Authors: Kaziwa Saleh, Sándor Szénási, Zoltán Vámossy,
Abstract summary: We present a model to reconstruct partially visible objects. The model takes a mask as an input, which we call weighted mask. By drawing more attention from the visible region, our model can predict the invisible patch more effectively than the baseline models.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We present a model to reconstruct partially visible objects. The model takes a mask as an input, which we call weighted mask. The mask is utilized by gated convolutions to assign more weight to the visible pixels of the occluded instance compared to the background, while ignoring the features of the invisible pixels. By drawing more attention from the visible region, our model can predict the invisible patch more effectively than the baseline models, especially in instances with uniform texture. The model is trained on COCOA dataset and two subsets of it in a self-supervised manner. The results demonstrate that our model generates higher quality and more texture-rich outputs compared to baseline models. Code is available at: https://github.com/KaziwaSaleh/mask-guided.

Related papers

MF-VITON: High-Fidelity Mask-Free Virtual Try-On with Minimal Input [69.33864837012202]
We propose a Mask-Free VITON framework that achieves realistic VITON using only a single person image and a target garment. We leverage existing Mask-based VITON models to synthesize a high-quality dataset. This dataset contains diverse, realistic pairs of person images and corresponding garments, augmented with varied backgrounds to mimic real-world scenarios.
arXiv Detail & Related papers (2025-03-11T17:40:59Z)
ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework. We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise. We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z)
Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data. We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process. In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z)
Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling [23.164631160130092]
We extend the success of BERT-style pre-training, or the masked image modeling, to convolutional networks (convnets) We treat unmasked pixels as sparse voxels of 3D point clouds and use sparse convolution to encode. This is the first use of sparse convolution for 2D masked modeling.
arXiv Detail & Related papers (2023-01-09T18:59:50Z)
Towards Improved Input Masking for Convolutional Neural Networks [66.99060157800403]
We propose a new masking method for CNNs we call layer masking. We show that our method is able to eliminate or minimize the influence of the mask shape or color on the output of the model. We also demonstrate how the shape of the mask may leak information about the class, thus affecting estimates of model reliance on class-relevant features.
arXiv Detail & Related papers (2022-11-26T19:31:49Z)
Stare at What You See: Masked Image Modeling without Reconstruction [154.74533119863864]
Masked Autoencoders (MAE) have been prevailing paradigms for large-scale vision representation pre-training. Recent approaches apply semantic-rich teacher models to extract image features as the reconstruction target, leading to better performance. We argue the features extracted by powerful teacher models already encode rich semantic correlation across regions in an intact image.
arXiv Detail & Related papers (2022-11-16T12:48:52Z)
AISFormer: Amodal Instance Segmentation with Transformer [9.042737643989561]
Amodal Instance (AIS) aims to segment the region of both visible and possible occluded parts of an object instance. We present AISFormer, an AIS framework, with a Transformer-based mask head.
arXiv Detail & Related papers (2022-10-12T15:42:40Z)
What You See is What You Classify: Black Box Attributions [61.998683569022006]
We train a deep network, the Explainer, to predict attributions for a pre-trained black-box classifier, the Explanandum. Unlike most existing approaches, ours is capable of directly generating very distinct class-specific masks. We show that our attributions are superior to established methods both visually and quantitatively.
arXiv Detail & Related papers (2022-05-23T12:30:04Z)
RePaint: Inpainting using Denoising Diffusion Probabilistic Models [161.74792336127345]
Free-form inpainting is the task of adding new content to an image in the regions specified by an arbitrary binary mask. We propose RePaint: A Denoising Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks. We validate our method for both faces and general-purpose image inpainting using standard and extreme masks.
arXiv Detail & Related papers (2022-01-24T18:40:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.