Continuous Object Representation Networks: Novel View Synthesis without
Target View Supervision
- URL: http://arxiv.org/abs/2007.15627v2
- Date: Fri, 23 Oct 2020 15:19:10 GMT
- Title: Continuous Object Representation Networks: Novel View Synthesis without
Target View Supervision
- Authors: Nicolai H\"ani, Selim Engin, Jun-Jee Chao and Volkan Isler
- Abstract summary: Continuous Object Representation Networks (CORN) is a conditional architecture that encodes an input image's geometry and appearance that map to a 3D consistent scene representation.
CORN achieves well on challenging tasks such as novel view synthesis and single-view 3D reconstruction and performance comparable to state-of-the-art approaches that use direct supervision.
- Score: 26.885846254261626
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Novel View Synthesis (NVS) is concerned with synthesizing views under camera
viewpoint transformations from one or multiple input images. NVS requires
explicit reasoning about 3D object structure and unseen parts of the scene to
synthesize convincing results. As a result, current approaches typically rely
on supervised training with either ground truth 3D models or multiple target
images. We propose Continuous Object Representation Networks (CORN), a
conditional architecture that encodes an input image's geometry and appearance
that map to a 3D consistent scene representation. We can train CORN with only
two source images per object by combining our model with a neural renderer. A
key feature of CORN is that it requires no ground truth 3D models or target
view supervision. Regardless, CORN performs well on challenging tasks such as
novel view synthesis and single-view 3D reconstruction and achieves performance
comparable to state-of-the-art approaches that use direct supervision. For
up-to-date information, data, and code, please see our project page:
https://nicolaihaeni.github.io/corn/.
Related papers
- DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - Denoising Diffusion via Image-Based Rendering [54.20828696348574]
We introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes.
First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes.
Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images.
arXiv Detail & Related papers (2024-02-05T19:00:45Z) - Free3D: Consistent Novel View Synthesis without 3D Representation [63.931920010054064]
Free3D is a simple accurate method for monocular open-set novel view synthesis (NVS)
Compared to other works that took a similar approach, we obtain significant improvements without resorting to an explicit 3D representation.
arXiv Detail & Related papers (2023-12-07T18:59:18Z) - Viewpoint Textual Inversion: Discovering Scene Representations and 3D View Control in 2D Diffusion Models [4.036372578802888]
We show that certain 3D scene representations are encoded in the text embedding space of models like Stable Diffusion.
We exploit the 3D scene representations for 3D vision tasks, namely, view-controlled text-to-image generation, and novel view synthesis from a single image.
arXiv Detail & Related papers (2023-09-14T18:52:16Z) - One-Shot Neural Fields for 3D Object Understanding [112.32255680399399]
We present a unified and compact scene representation for robotics.
Each object in the scene is depicted by a latent code capturing geometry and appearance.
This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction, and stable grasp prediction.
arXiv Detail & Related papers (2022-10-21T17:33:14Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - ViewFormer: NeRF-free Neural Rendering from Few Images Using
Transformers [34.4824364161812]
Novel view synthesis is a problem where we are given only a few context views sparsely covering a scene or an object.
The goal is to predict novel viewpoints in the scene, which requires learning priors.
We propose a 2D-only method that maps multiple context views and a query pose to a new image in a single pass of a neural network.
arXiv Detail & Related papers (2022-03-18T21:08:23Z) - Neural Body: Implicit Neural Representations with Structured Latent
Codes for Novel View Synthesis of Dynamic Humans [56.63912568777483]
This paper addresses the challenge of novel view synthesis for a human performer from a very sparse set of camera views.
We propose Neural Body, a new human body representation which assumes that the learned neural representations at different frames share the same set of latent codes anchored to a deformable mesh.
Experiments on ZJU-MoCap show that our approach outperforms prior works by a large margin in terms of novel view synthesis quality.
arXiv Detail & Related papers (2020-12-31T18:55:38Z) - AUTO3D: Novel view synthesis through unsupervisely learned variational
viewpoint and global 3D representation [27.163052958878776]
This paper targets on learning-based novel view synthesis from a single or limited 2D images without the pose supervision.
We construct an end-to-end trainable conditional variational framework to disentangle the unsupervisely learned relative-pose/rotation and implicit global 3D representation.
Our system can achieve implicitly 3D understanding without explicitly 3D reconstruction.
arXiv Detail & Related papers (2020-07-13T18:51:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.