Geometry-biased Transformers for Novel View Synthesis
- URL: http://arxiv.org/abs/2301.04650v1
- Date: Wed, 11 Jan 2023 18:59:56 GMT
- Title: Geometry-biased Transformers for Novel View Synthesis
- Authors: Naveen Venkat, Mayank Agarwal, Maneesh Singh, Shubham Tulsiani
- Abstract summary: We tackle the task of synthesizing novel views of an object given a few input images and associated camera viewpoints.
Our work is inspired by recent 'geometry-free' approaches where multi-view images are encoded as a (global) set-latent representation.
We propose 'Geometry-biased Transformers' (GBTs) that incorporate geometric inductive biases in the set-latent representation-based inference.
- Score: 36.11342728319563
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We tackle the task of synthesizing novel views of an object given a few input
images and associated camera viewpoints. Our work is inspired by recent
'geometry-free' approaches where multi-view images are encoded as a (global)
set-latent representation, which is then used to predict the color for
arbitrary query rays. While this representation yields (coarsely) accurate
images corresponding to novel viewpoints, the lack of geometric reasoning
limits the quality of these outputs. To overcome this limitation, we propose
'Geometry-biased Transformers' (GBTs) that incorporate geometric inductive
biases in the set-latent representation-based inference to encourage multi-view
geometric consistency. We induce the geometric bias by augmenting the
dot-product attention mechanism to also incorporate 3D distances between rays
associated with tokens as a learnable bias. We find that this, along with
camera-aware embeddings as input, allows our models to generate significantly
more accurate outputs. We validate our approach on the real-world CO3D dataset,
where we train our system over 10 categories and evaluate its view-synthesis
ability for novel objects as well as unseen categories. We empirically validate
the benefits of the proposed geometric biases and show that our approach
significantly improves over prior works.
Related papers
- G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images [45.66479596827045]
We propose a Geometry-enhanced NeRF (G-NeRF), which seeks to enhance the geometry priors by a geometry-guided multi-view synthesis approach.
To tackle the absence of multi-view supervision for single-view images, we design the depth-aware training approach.
arXiv Detail & Related papers (2024-04-11T04:58:18Z) - GTA: A Geometry-Aware Attention Mechanism for Multi-View Transformers [63.41460219156508]
We argue that existing positional encoding schemes are suboptimal for 3D vision tasks.
We propose a geometry-aware attention mechanism that encodes the geometric structure of tokens as relative transformation.
We show that our attention, called Geometric Transform Attention (GTA), improves learning efficiency and performance of state-of-the-art transformer-based NVS models.
arXiv Detail & Related papers (2023-10-16T13:16:09Z) - Explicit Correspondence Matching for Generalizable Neural Radiance
Fields [49.49773108695526]
We present a new NeRF method that is able to generalize to new unseen scenarios and perform novel view synthesis with as few as two source views.
The explicit correspondence matching is quantified with the cosine similarity between image features sampled at the 2D projections of a 3D point on different views.
Our method achieves state-of-the-art results on different evaluation settings, with the experiments showing a strong correlation between our learned cosine feature similarity and volume density.
arXiv Detail & Related papers (2023-04-24T17:46:01Z) - Learning to Render Novel Views from Wide-Baseline Stereo Pairs [26.528667940013598]
We introduce a method for novel view synthesis given only a single wide-baseline stereo image pair.
Existing approaches to novel view synthesis from sparse observations fail due to recovering incorrect 3D geometry.
We propose an efficient, image-space epipolar line sampling scheme to assemble image features for a target ray.
arXiv Detail & Related papers (2023-04-17T17:40:52Z) - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from
Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images.
Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - Geometry-Free View Synthesis: Transformers and no 3D Priors [16.86600007830682]
We show that a transformer-based model can synthesize entirely novel views without any hand-engineered 3D biases.
This is achieved by (i) a global attention mechanism for implicitly learning long-range 3D correspondences between source and target views.
arXiv Detail & Related papers (2021-04-15T17:58:05Z) - Nothing But Geometric Constraints: A Model-Free Method for Articulated
Object Pose Estimation [89.82169646672872]
We propose an unsupervised vision-based system to estimate the joint configurations of the robot arm from a sequence of RGB or RGB-D images without knowing the model a priori.
We combine a classical geometric formulation with deep learning and extend the use of epipolar multi-rigid-body constraints to solve this task.
arXiv Detail & Related papers (2020-11-30T20:46:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.