EpipolarNVS: leveraging on Epipolar geometry for single-image Novel View
Synthesis
- URL: http://arxiv.org/abs/2210.13077v1
- Date: Mon, 24 Oct 2022 09:54:20 GMT
- Title: EpipolarNVS: leveraging on Epipolar geometry for single-image Novel View
Synthesis
- Authors: Ga\'etan Landreau and Mohamed Tamaazousti
- Abstract summary: Novel-view synthesis (NVS) can be tackled through different approaches, depending on the general setting.
The most challenging scenario, the one where we stand in this work, only considers a unique source image to generate a novel one from another viewpoint.
We introduce an innovative method that encodes the viewpoint transformation as a 2D feature image.
- Score: 6.103988053817792
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Novel-view synthesis (NVS) can be tackled through different approaches,
depending on the general setting: a single source image to a short video
sequence, exact or noisy camera pose information, 3D-based information such as
point clouds etc. The most challenging scenario, the one where we stand in this
work, only considers a unique source image to generate a novel one from another
viewpoint. However, in such a tricky situation, the latest learning-based
solutions often struggle to integrate the camera viewpoint transformation.
Indeed, the extrinsic information is often passed as-is, through a
low-dimensional vector. It might even occur that such a camera pose, when
parametrized as Euler angles, is quantized through a one-hot representation.
This vanilla encoding choice prevents the learnt architecture from inferring
novel views on a continuous basis (from a camera pose perspective). We claim it
exists an elegant way to better encode relative camera pose, by leveraging
3D-related concepts such as the epipolar constraint. We, therefore, introduce
an innovative method that encodes the viewpoint transformation as a 2D feature
image. Such a camera encoding strategy gives meaningful insights to the network
regarding how the camera has moved in space between the two views. By encoding
the camera pose information as a finite number of coloured epipolar lines, we
demonstrate through our experiments that our strategy outperforms vanilla
encoding.
Related papers
- Generating 3D-Consistent Videos from Unposed Internet Photos [68.944029293283]
We train a scalable, 3D-aware video model without any 3D annotations such as camera parameters.
Our results suggest that we can scale up scene-level 3D learning using only 2D data such as videos and multiview internet photos.
arXiv Detail & Related papers (2024-11-20T18:58:31Z) - DUSt3R: Geometric 3D Vision Made Easy [8.471330244002564]
We introduce DUSt3R, a novel paradigm for Dense and Unconstrained Stereo 3D Reconstruction of arbitrary image collections.
We show that this formulation smoothly unifies the monocular and binocular reconstruction cases.
Our formulation directly provides a 3D model of the scene as well as depth information, but interestingly, we can seamlessly recover from it, pixel matches, relative and absolute camera.
arXiv Detail & Related papers (2023-12-21T18:52:14Z) - Free3D: Consistent Novel View Synthesis without 3D Representation [63.931920010054064]
Free3D is a simple accurate method for monocular open-set novel view synthesis (NVS)
Compared to other works that took a similar approach, we obtain significant improvements without resorting to an explicit 3D representation.
arXiv Detail & Related papers (2023-12-07T18:59:18Z) - RUST: Latent Neural Scene Representations from Unposed Imagery [21.433079925439234]
Inferring structure of 3D scenes from 2D observations is a fundamental challenge in computer vision.
Recent popularized approaches based on neural scene representations have achieved tremendous impact.
RUST (Really Unposed Scene representation Transformer) is a pose-free approach to novel view trained on RGB images alone.
arXiv Detail & Related papers (2022-11-25T18:59:10Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - Virtual Correspondence: Humans as a Cue for Extreme-View Geometry [104.09449367670318]
We present a novel concept called virtual correspondences (VCs)
VCs conform with epipolar geometry; unlike classic correspondences, VCs do not need to be co-visible across views.
We show how VCs can be seamlessly integrated with classic bundle adjustment to recover camera poses across extreme views.
arXiv Detail & Related papers (2022-06-16T17:59:42Z) - ViewFormer: NeRF-free Neural Rendering from Few Images Using
Transformers [34.4824364161812]
Novel view synthesis is a problem where we are given only a few context views sparsely covering a scene or an object.
The goal is to predict novel viewpoints in the scene, which requires learning priors.
We propose a 2D-only method that maps multiple context views and a query pose to a new image in a single pass of a neural network.
arXiv Detail & Related papers (2022-03-18T21:08:23Z) - Learning Neural Representation of Camera Pose with Matrix Representation
of Pose Shift via View Synthesis [105.37072293076767]
How to effectively represent camera pose is an essential problem in 3D computer vision.
We propose an approach to learn neural representations of camera poses and 3D scenes.
We conduct extensive experiments on synthetic and real datasets.
arXiv Detail & Related papers (2021-04-04T00:40:53Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.