im2nerf: Image to Neural Radiance Field in the Wild
- URL: http://arxiv.org/abs/2209.04061v1
- Date: Thu, 8 Sep 2022 23:28:56 GMT
- Title: im2nerf: Image to Neural Radiance Field in the Wild
- Authors: Lu Mi, Abhijit Kundu, David Ross, Frank Dellaert, Noah Snavely,
Alireza Fathi
- Abstract summary: im2nerf is a learning framework that predicts a continuous neural object representation given a single input image in the wild.
We show that im2nerf achieves the state-of-the-art performance for novel view synthesis from a single-view unposed image in the wild.
- Score: 47.18702901448768
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose im2nerf, a learning framework that predicts a continuous neural
object representation given a single input image in the wild, supervised by
only segmentation output from off-the-shelf recognition methods. The standard
approach to constructing neural radiance fields takes advantage of multi-view
consistency and requires many calibrated views of a scene, a requirement that
cannot be satisfied when learning on large-scale image data in the wild. We
take a step towards addressing this shortcoming by introducing a model that
encodes the input image into a disentangled object representation that contains
a code for object shape, a code for object appearance, and an estimated camera
pose from which the object image is captured. Our model conditions a NeRF on
the predicted object representation and uses volume rendering to generate
images from novel views. We train the model end-to-end on a large collection of
input images. As the model is only provided with single-view images, the
problem is highly under-constrained. Therefore, in addition to using a
reconstruction loss on the synthesized input view, we use an auxiliary
adversarial loss on the novel rendered views. Furthermore, we leverage object
symmetry and cycle camera pose consistency. We conduct extensive quantitative
and qualitative experiments on the ShapeNet dataset as well as qualitative
experiments on Open Images dataset. We show that in all cases, im2nerf achieves
the state-of-the-art performance for novel view synthesis from a single-view
unposed image in the wild.
Related papers
- MetaCap: Meta-learning Priors from Multi-View Imagery for Sparse-view Human Performance Capture and Rendering [91.76893697171117]
We propose a method for efficient and high-quality geometry recovery and novel view synthesis given very sparse or even a single view of the human.
Our key idea is to meta-learn the radiance field weights solely from potentially sparse multi-view videos.
We collect a new dataset, WildDynaCap, which contains subjects captured in, both, a dense camera dome and in-the-wild sparse camera rigs.
arXiv Detail & Related papers (2024-03-27T17:59:54Z) - UpFusion: Novel View Diffusion from Unposed Sparse View Observations [66.36092764694502]
UpFusion can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images.
We show that this mechanism allows generating high-fidelity novel views while improving the synthesis quality given additional (unposed) images.
arXiv Detail & Related papers (2023-12-11T18:59:55Z) - Multi-View Unsupervised Image Generation with Cross Attention Guidance [23.07929124170851]
This paper introduces a novel pipeline for unsupervised training of a pose-conditioned diffusion model on single-category datasets.
We identify object poses by clustering the dataset through comparing visibility and locations of specific object parts.
Our model, MIRAGE, surpasses prior work in novel view synthesis on real images.
arXiv Detail & Related papers (2023-12-07T14:55:13Z) - Learning Robust Multi-Scale Representation for Neural Radiance Fields
from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision.
The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z) - Zero-1-to-3: Zero-shot One Image to 3D Object [30.455300183998247]
We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image.
Our conditional diffusion model uses a synthetic dataset to learn controls of the relative camera viewpoint.
Our method significantly outperforms state-of-the-art single-view 3D reconstruction and novel view synthesis models by leveraging Internet-scale pre-training.
arXiv Detail & Related papers (2023-03-20T17:59:50Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - Neural Rendering of Humans in Novel View and Pose from Monocular Video [68.37767099240236]
We introduce a new method that generates photo-realistic humans under novel views and poses given a monocular video as input.
Our method significantly outperforms existing approaches under unseen poses and novel views given monocular videos as input.
arXiv Detail & Related papers (2022-04-04T03:09:20Z) - Unsupervised Novel View Synthesis from a Single Image [47.37120753568042]
Novel view synthesis from a single image aims at generating novel views from a single input image of an object.
This work aims at relaxing this assumption enabling training of conditional generative model for novel view synthesis in a completely unsupervised manner.
arXiv Detail & Related papers (2021-02-05T16:56:04Z) - Sequential View Synthesis with Transformer [13.200139959163574]
We introduce a sequential rendering decoder to predict an image sequence, including the target view, based on the learned representations.
We evaluate our model on various challenging datasets and demonstrate that our model not only gives consistent predictions but also doesn't require any retraining for finetuning.
arXiv Detail & Related papers (2020-04-09T14:15:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.