Deep Learning based Novel View Synthesis
- URL: http://arxiv.org/abs/2107.06812v1
- Date: Wed, 14 Jul 2021 16:15:36 GMT
- Title: Deep Learning based Novel View Synthesis
- Authors: Amit More and Subhasis Chaudhuri
- Abstract summary: We propose a deep convolutional neural network (CNN) which learns to predict novel views of a scene from given collection of images.
In comparison to prior deep learning based approaches, which can handle only a fixed number of input images to predict novel view, proposed approach works with different numbers of input images.
- Score: 18.363945964373553
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Predicting novel views of a scene from real-world images has always been a
challenging task. In this work, we propose a deep convolutional neural network
(CNN) which learns to predict novel views of a scene from given collection of
images. In comparison to prior deep learning based approaches, which can handle
only a fixed number of input images to predict novel view, proposed approach
works with different numbers of input images. The proposed model explicitly
performs feature extraction and matching from a given pair of input images and
estimates, at each pixel, the probability distribution (pdf) over possible
depth levels in the scene. This pdf is then used for estimating the novel view.
The model estimates multiple predictions of novel view, one estimate per input
image pair, from given image collection. The model also estimates an occlusion
mask and combines multiple novel view estimates in to a single optimal
prediction. The finite number of depth levels used in the analysis may cause
occasional blurriness in the estimated view. We mitigate this issue with simple
multi-resolution analysis which improves the quality of the estimates. We
substantiate the performance on different datasets and show competitive
performance.
Related papers
- Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP.
VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone.
Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z) - VaLID: Variable-Length Input Diffusion for Novel View Synthesis [36.57742242154048]
Novel View Synthesis (NVS), which tries to produce a realistic image at the target view given source view images and their corresponding poses, is a fundamental problem in 3D Vision.
We try to process each pose image pair separately and then fuse them as a unified visual representation which will be injected into the model.
The Multi-view Cross Former module is proposed which maps variable-length input data to fix-size output data.
arXiv Detail & Related papers (2023-12-14T12:52:53Z) - Learning Robust Multi-Scale Representation for Neural Radiance Fields
from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision.
The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z) - Break-A-Scene: Extracting Multiple Concepts from a Single Image [80.47666266017207]
We introduce the task of textual scene decomposition.
We propose augmenting the input image with masks that indicate the presence of target concepts.
We then present a novel two-phase customization process.
arXiv Detail & Related papers (2023-05-25T17:59:04Z) - im2nerf: Image to Neural Radiance Field in the Wild [47.18702901448768]
im2nerf is a learning framework that predicts a continuous neural object representation given a single input image in the wild.
We show that im2nerf achieves the state-of-the-art performance for novel view synthesis from a single-view unposed image in the wild.
arXiv Detail & Related papers (2022-09-08T23:28:56Z) - Multi-View Depth Estimation by Fusing Single-View Depth Probability with
Multi-View Geometry [25.003116148843525]
We propose MaGNet, a framework for fusing single-view depth probability with multi-view geometry.
MaGNet achieves state-of-the-art performance on ScanNet, 7-Scenes and KITTI.
arXiv Detail & Related papers (2021-12-15T14:56:53Z) - Self-Supervised Visibility Learning for Novel View Synthesis [79.53158728483375]
Conventional rendering methods estimate scene geometry and synthesize novel views in two separate steps.
We propose an end-to-end NVS framework to eliminate the error propagation issue.
Our network is trained in an end-to-end self-supervised fashion, thus significantly alleviating error accumulation in view synthesis.
arXiv Detail & Related papers (2021-03-29T08:11:25Z) - A Lightweight Neural Network for Monocular View Generation with
Occlusion Handling [46.74874316127603]
We present a very lightweight neural network architecture, trained on stereo data pairs, which performs view synthesis from one single image.
The work outperforms visually and metric-wise state-of-the-art approaches on the challenging KITTI dataset.
arXiv Detail & Related papers (2020-07-24T15:29:01Z) - Sequential View Synthesis with Transformer [13.200139959163574]
We introduce a sequential rendering decoder to predict an image sequence, including the target view, based on the learned representations.
We evaluate our model on various challenging datasets and demonstrate that our model not only gives consistent predictions but also doesn't require any retraining for finetuning.
arXiv Detail & Related papers (2020-04-09T14:15:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.