Related papers: EpipolarNVS: leveraging on Epipolar geometry for single-image Novel View Synthesis

EpipolarNVS: leveraging on Epipolar geometry for single-image Novel View Synthesis

URL: http://arxiv.org/abs/2210.13077v1
Date: Mon, 24 Oct 2022 09:54:20 GMT
Title: EpipolarNVS: leveraging on Epipolar geometry for single-image Novel View Synthesis
Authors: Ga\'etan Landreau and Mohamed Tamaazousti
Abstract summary: Novel-view synthesis (NVS) can be tackled through different approaches, depending on the general setting. The most challenging scenario, the one where we stand in this work, only considers a unique source image to generate a novel one from another viewpoint. We introduce an innovative method that encodes the viewpoint transformation as a 2D feature image.
Score: 6.103988053817792
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Novel-view synthesis (NVS) can be tackled through different approaches, depending on the general setting: a single source image to a short video sequence, exact or noisy camera pose information, 3D-based information such as point clouds etc. The most challenging scenario, the one where we stand in this work, only considers a unique source image to generate a novel one from another viewpoint. However, in such a tricky situation, the latest learning-based solutions often struggle to integrate the camera viewpoint transformation. Indeed, the extrinsic information is often passed as-is, through a low-dimensional vector. It might even occur that such a camera pose, when parametrized as Euler angles, is quantized through a one-hot representation. This vanilla encoding choice prevents the learnt architecture from inferring novel views on a continuous basis (from a camera pose perspective). We claim it exists an elegant way to better encode relative camera pose, by leveraging 3D-related concepts such as the epipolar constraint. We, therefore, introduce an innovative method that encodes the viewpoint transformation as a 2D feature image. Such a camera encoding strategy gives meaningful insights to the network regarding how the camera has moved in space between the two views. By encoding the camera pose information as a finite number of coloured epipolar lines, we demonstrate through our experiments that our strategy outperforms vanilla encoding.

Related papers

Cameras as Relative Positional Encoding [37.675563572777136]
Multi-view transformers must use camera geometry to ground visual tokens in 3D space.<n>We show how relative camera conditioning improves performance in feedforward novel view synthesis.<n>We then verify that these benefits persist for different tasks, stereo depth estimation and discriminative cognition, as well as larger model sizes.
arXiv Detail & Related papers (2025-07-14T17:22:45Z)
Generating 3D-Consistent Videos from Unposed Internet Photos [68.944029293283]
We train a scalable, 3D-aware video model without any 3D annotations such as camera parameters. Our results suggest that we can scale up scene-level 3D learning using only 2D data such as videos and multiview internet photos.
arXiv Detail & Related papers (2024-11-20T18:58:31Z)
DUSt3R: Geometric 3D Vision Made Easy [8.471330244002564]
We introduce DUSt3R, a novel paradigm for Dense and Unconstrained Stereo 3D Reconstruction of arbitrary image collections. We show that this formulation smoothly unifies the monocular and binocular reconstruction cases. Our formulation directly provides a 3D model of the scene as well as depth information, but interestingly, we can seamlessly recover from it, pixel matches, relative and absolute camera.
arXiv Detail & Related papers (2023-12-21T18:52:14Z)
Free3D: Consistent Novel View Synthesis without 3D Representation [63.931920010054064]
Free3D is a simple accurate method for monocular open-set novel view synthesis (NVS) Compared to other works that took a similar approach, we obtain significant improvements without resorting to an explicit 3D representation.
arXiv Detail & Related papers (2023-12-07T18:59:18Z)
RUST: Latent Neural Scene Representations from Unposed Imagery [21.433079925439234]
Inferring structure of 3D scenes from 2D observations is a fundamental challenge in computer vision. Recent popularized approaches based on neural scene representations have achieved tremendous impact. RUST (Really Unposed Scene representation Transformer) is a pose-free approach to novel view trained on RGB images alone.
arXiv Detail & Related papers (2022-11-25T18:59:10Z)
Vision Transformer for NeRF-Based View Synthesis from a Single Input Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation. To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering. Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z)
Virtual Correspondence: Humans as a Cue for Extreme-View Geometry [104.09449367670318]
We present a novel concept called virtual correspondences (VCs) VCs conform with epipolar geometry; unlike classic correspondences, VCs do not need to be co-visible across views. We show how VCs can be seamlessly integrated with classic bundle adjustment to recover camera poses across extreme views.
arXiv Detail & Related papers (2022-06-16T17:59:42Z)
ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers [34.4824364161812]
Novel view synthesis is a problem where we are given only a few context views sparsely covering a scene or an object. The goal is to predict novel viewpoints in the scene, which requires learning priors. We propose a 2D-only method that maps multiple context views and a query pose to a new image in a single pass of a neural network.
arXiv Detail & Related papers (2022-03-18T21:08:23Z)
Learning Neural Representation of Camera Pose with Matrix Representation of Pose Shift via View Synthesis [105.37072293076767]
How to effectively represent camera pose is an essential problem in 3D computer vision. We propose an approach to learn neural representations of camera poses and 3D scenes. We conduct extensive experiments on synthetic and real datasets.
arXiv Detail & Related papers (2021-04-04T00:40:53Z)
Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras. We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points. Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.