Related papers: A Lightweight Neural Network for Monocular View Generation with Occlusion Handling

A Lightweight Neural Network for Monocular View Generation with Occlusion Handling

URL: http://arxiv.org/abs/2007.12577v1
Date: Fri, 24 Jul 2020 15:29:01 GMT
Title: A Lightweight Neural Network for Monocular View Generation with Occlusion Handling
Authors: Simon Evain and Christine Guillemot
Abstract summary: We present a very lightweight neural network architecture, trained on stereo data pairs, which performs view synthesis from one single image. The work outperforms visually and metric-wise state-of-the-art approaches on the challenging KITTI dataset.
Score: 46.74874316127603
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this article, we present a very lightweight neural network architecture, trained on stereo data pairs, which performs view synthesis from one single image. With the growing success of multi-view formats, this problem is indeed increasingly relevant. The network returns a prediction built from disparity estimation, which fills in wrongly predicted regions using a occlusion handling technique. To do so, during training, the network learns to estimate the left-right consistency structural constraint on the pair of stereo input images, to be able to replicate it at test time from one single image. The method is built upon the idea of blending two predictions: a prediction based on disparity estimation, and a prediction based on direct minimization in occluded regions. The network is also able to identify these occluded areas at training and at test time by checking the pixelwise left-right consistency of the produced disparity maps. At test time, the approach can thus generate a left-side and a right-side view from one input image, as well as a depth map and a pixelwise confidence measure in the prediction. The work outperforms visually and metric-wise state-of-the-art approaches on the challenging KITTI dataset, all while reducing by a very significant order of magnitude (5 or 10 times) the required number of parameters (6.5 M).

Related papers

Left-right Discrepancy for Adversarial Attack on Stereo Networks [8.420135490466851]
We introduce a novel adversarial attack approach that generates perturbation noise specifically designed to maximize the discrepancy between left and right image features. Experiments demonstrate the superior capability of our method to induce larger prediction errors in stereo neural networks.
arXiv Detail & Related papers (2024-01-14T02:30:38Z)
Uncertainty Quantification via Neural Posterior Principal Components [26.26693707762823]
Uncertainty quantification is crucial for the deployment of image restoration models in safety-critical domains. We present a method for predicting the PCs of the posterior distribution for any input image, in a single forward pass of a neural network. Our method reliably conveys instance-adaptive uncertainty directions, achieving uncertainty quantification comparable with posterior samplers.
arXiv Detail & Related papers (2023-09-27T09:51:29Z)
APRF: Anti-Aliasing Projection Representation Field for Inverse Problem in Imaging [74.9262846410559]
Sparse-view Computed Tomography (SVCT) reconstruction is an ill-posed inverse problem in imaging. Recent works use Implicit Neural Representations (INRs) to build the coordinate-based mapping between sinograms and CT images. We propose a self-supervised SVCT reconstruction method -- Anti-Aliasing Projection Representation Field (APRF) APRF can build the continuous representation between adjacent projection views via the spatial constraints.
arXiv Detail & Related papers (2023-07-11T14:04:12Z)
Unsupervised Light Field Depth Estimation via Multi-view Feature Matching with Occlusion Prediction [15.421219881815956]
It is costly to obtain sufficient depth labels for supervised training. In this paper, we propose an unsupervised framework to estimate depth from LF images.
arXiv Detail & Related papers (2023-01-20T06:11:17Z)
Decoupled Mixup for Generalized Visual Recognition [71.13734761715472]
We propose a novel "Decoupled-Mixup" method to train CNN models for visual recognition. Our method decouples each image into discriminative and noise-prone regions, and then heterogeneously combines these regions to train CNN models. Experiment results show the high generalization performance of our method on testing data that are composed of unseen contexts.
arXiv Detail & Related papers (2022-10-26T15:21:39Z)
A Novel Hand Gesture Detection and Recognition system based on ensemble-based Convolutional Neural Network [3.5665681694253903]
Detection of hand portion has become a challenging task in computer vision and pattern recognition communities. Deep learning algorithm like convolutional neural network (CNN) architecture has become a very popular choice for classification tasks. In this paper, an ensemble of CNN-based approaches is presented to overcome some problems like high variance during prediction, overfitting problem and also prediction errors.
arXiv Detail & Related papers (2022-02-25T06:46:58Z)
Deep Learning based Novel View Synthesis [18.363945964373553]
We propose a deep convolutional neural network (CNN) which learns to predict novel views of a scene from given collection of images. In comparison to prior deep learning based approaches, which can handle only a fixed number of input images to predict novel view, proposed approach works with different numbers of input images.
arXiv Detail & Related papers (2021-07-14T16:15:36Z)
CAMERAS: Enhanced Resolution And Sanity preserving Class Activation Mapping for image saliency [61.40511574314069]
Backpropagation image saliency aims at explaining model predictions by estimating model-centric importance of individual pixels in the input. We propose CAMERAS, a technique to compute high-fidelity backpropagation saliency maps without requiring any external priors.
arXiv Detail & Related papers (2021-06-20T08:20:56Z)
Compressive sensing with un-trained neural networks: Gradient descent finds the smoothest approximation [60.80172153614544]
Un-trained convolutional neural networks have emerged as highly successful tools for image recovery and restoration. We show that an un-trained convolutional neural network can approximately reconstruct signals and images that are sufficiently structured, from a near minimal number of random measurements.
arXiv Detail & Related papers (2020-05-07T15:57:25Z)
Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation [51.17232267143098]
We propose a novel system named Disp R-CNN for 3D object detection from stereo images. We use a statistical shape model to generate dense disparity pseudo-ground-truth without the need of LiDAR point clouds. Experiments on the KITTI dataset show that, even when LiDAR ground-truth is not available at training time, Disp R-CNN achieves competitive performance and outperforms previous state-of-the-art methods by 20% in terms of average precision.
arXiv Detail & Related papers (2020-04-07T17:48:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.