SaiNet: Stereo aware inpainting behind objects with generative networks
- URL: http://arxiv.org/abs/2205.07014v1
- Date: Sat, 14 May 2022 09:07:30 GMT
- Title: SaiNet: Stereo aware inpainting behind objects with generative networks
- Authors: Violeta Men\'endez Gonz\'alez, Andrew Gilbert, Graeme Phillipson,
Stephen Jolly, Simon Hadfield
- Abstract summary: We present an end-to-end network for stereo-consistent image inpainting with the objective of inpainting large missing regions behind objects.
The proposed model consists of an edge-guided UNet-like network using Partial Convolutions.
- Score: 21.35917056958527
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we present an end-to-end network for stereo-consistent image
inpainting with the objective of inpainting large missing regions behind
objects. The proposed model consists of an edge-guided UNet-like network using
Partial Convolutions. We enforce multi-view stereo consistency by introducing a
disparity loss. More importantly, we develop a training scheme where the model
is learned from realistic stereo masks representing object occlusions, instead
of the more common random masks. The technique is trained in a supervised way.
Our evaluation shows competitive results compared to previous state-of-the-art
techniques.
Related papers
- MaDis-Stereo: Enhanced Stereo Matching via Distilled Masked Image Modeling [18.02254687807291]
Transformer-based stereo models have been studied recently, their performance still lags behind CNN-based stereo models due to the inherent data scarcity issue in the stereo matching task.
We propose Masked Image Modeling Distilled Stereo matching model, termed MaDis-Stereo, that enhances locality inductive bias by leveraging Masked Image Modeling (MIM) in training Transformer-based stereo model.
arXiv Detail & Related papers (2024-09-04T16:17:45Z) - LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes.
We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net)
The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z) - Asymmetric Mask Scheme for Self-Supervised Real Image Denoising [14.18283674891189]
We propose a single mask scheme for self-supervised denoising training, which eliminates the need for blind spot operation.
Our method, featuring the asymmetric mask scheme in training and inference, achieves state-of-the-art performance on existing real noisy image datasets.
arXiv Detail & Related papers (2024-07-09T03:01:28Z) - SyntStereo2Real: Edge-Aware GAN for Remote Sensing Image-to-Image Translation while Maintaining Stereo Constraint [1.8749305679160366]
Current methods involve combining two networks, an unpaired image-to-image translation network and a stereo-matching network.
We propose an edge-aware GAN-based network that effectively tackles both tasks simultaneously.
We demonstrate that our model produces qualitatively and quantitatively superior results than existing models, and its applicability extends to diverse domains.
arXiv Detail & Related papers (2024-04-14T14:58:52Z) - Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where [63.61248884015162]
We aim to alleviate the burden of including masking operation into the contrastive-learning framework for convolutional neural networks.
We propose to explicitly take the saliency constraint into consideration in which the masked regions are more evenly distributed among the foreground and background.
arXiv Detail & Related papers (2023-09-22T09:58:38Z) - Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data.
We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process.
In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z) - Learning Prior Feature and Attention Enhanced Image Inpainting [63.21231753407192]
This paper incorporates the pre-training based Masked AutoEncoder (MAE) into the inpainting model.
We propose to use attention priors from MAE to make the inpainting model learn more long-distance dependencies between masked and unmasked regions.
arXiv Detail & Related papers (2022-08-03T04:32:53Z) - Contextual Attention Mechanism, SRGAN Based Inpainting System for
Eliminating Interruptions from Images [2.894944733573589]
We propose an end-to-end pipeline for inpainting images using a complete Machine Learning approach.
We first use the YOLO model to automatically identify and localize the object we wish to remove from the image.
After this, we provide the masked image and original image to the GAN model which uses the Contextual Attention method to fill in the region.
arXiv Detail & Related papers (2022-04-06T05:51:04Z) - Revisiting Domain Generalized Stereo Matching Networks from a Feature
Consistency Perspective [65.37571681370096]
We propose a simple pixel-wise contrastive learning across the viewpoints.
A stereo selective whitening loss is introduced to better preserve the stereo feature consistency across domains.
Our method achieves superior performance over several state-of-the-art networks.
arXiv Detail & Related papers (2022-03-21T11:21:41Z) - Reversing the cycle: self-supervised deep stereo through enhanced
monocular distillation [51.714092199995044]
In many fields, self-supervised learning solutions are rapidly evolving and filling the gap with supervised approaches.
We propose a novel self-supervised paradigm reversing the link between the two.
In order to train deep stereo networks, we distill knowledge through a monocular completion network.
arXiv Detail & Related papers (2020-08-17T07:40:22Z) - R-MNet: A Perceptual Adversarial Network for Image Inpainting [5.471225956329675]
We propose a Wasserstein GAN combined with a new reverse mask operator, namely Reverse Masking Network (R-MNet), a perceptual adversarial network for image inpainting.
We show that our method is able to generalize to high-resolution inpainting task, and further show more realistic outputs that are plausible to the human visual system.
arXiv Detail & Related papers (2020-08-11T10:58:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.