Pix2NeRF: Unsupervised Conditional $\pi$-GAN for Single Image to Neural
Radiance Fields Translation
- URL: http://arxiv.org/abs/2202.13162v1
- Date: Sat, 26 Feb 2022 15:28:05 GMT
- Title: Pix2NeRF: Unsupervised Conditional $\pi$-GAN for Single Image to Neural
Radiance Fields Translation
- Authors: Shengqu Cai and Anton Obukhov and Dengxin Dai and Luc Van Gool
- Abstract summary: We propose a pipeline to generate Neural Radiance Fields(NeRF) of an object or a scene of a specific class, conditioned on a single input image.
Our method is based on $pi$-GAN, a generative model for unconditional 3D-aware image synthesis.
- Score: 93.77693306391059
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We propose a pipeline to generate Neural Radiance Fields~(NeRF) of an object
or a scene of a specific class, conditioned on a single input image. This is a
challenging task, as training NeRF requires multiple views of the same scene,
coupled with corresponding poses, which are hard to obtain. Our method is based
on $\pi$-GAN, a generative model for unconditional 3D-aware image synthesis,
which maps random latent codes to radiance fields of a class of objects. We
jointly optimize (1) the $\pi$-GAN objective to utilize its high-fidelity
3D-aware generation and (2) a carefully designed reconstruction objective. The
latter includes an encoder coupled with $\pi$-GAN generator to form an
auto-encoder. Unlike previous few-shot NeRF approaches, our pipeline is
unsupervised, capable of being trained with independent images without 3D,
multi-view, or pose supervision. Applications of our pipeline include 3d avatar
generation, object-centric novel view synthesis with a single input image, and
3d-aware super-resolution, to name a few.
Related papers
- DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - ZIGNeRF: Zero-shot 3D Scene Representation with Invertible Generative
Neural Radiance Fields [2.458437232470188]
We introduce ZIGNeRF, an innovative model that executes zero-shot Generative Adrial Network (GAN)versa for the generation of multi-view images from a single out-of-domain image.
ZIGNeRF is capable of disentangling the object from the background and executing 3D operations such as 360-degree rotation or depth and horizontal translation.
arXiv Detail & Related papers (2023-06-05T09:41:51Z) - NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from
3D-aware Diffusion [107.67277084886929]
Novel view synthesis from a single image requires inferring occluded regions of objects and scenes whilst simultaneously maintaining semantic and physical consistency with the input.
We propose NerfDiff, which addresses this issue by distilling the knowledge of a 3D-aware conditional diffusion model (CDM) into NeRF through synthesizing and refining a set of virtual views at test time.
We further propose a novel NeRF-guided distillation algorithm that simultaneously generates 3D consistent virtual views from the CDM samples, and finetunes the NeRF based on the improved virtual views.
arXiv Detail & Related papers (2023-02-20T17:12:00Z) - Class-Continuous Conditional Generative Neural Radiance Field [4.036530158875673]
We introduce a novel model, called Class-Continuous Generative NeRF ($textC3$G-NeRF), which can synthesize conditionally manipulated 3D-consistent images.
Our model shows strong 3D-consistency with fine details and smooth in conditional feature manipulation.
arXiv Detail & Related papers (2023-01-03T05:10:37Z) - 3D-Aware Encoding for Style-based Neural Radiance Fields [50.118687869198716]
We learn an inversion function to project an input image to the latent space of a NeRF generator and then synthesize novel views of the original image based on the latent code.
Compared with GAN inversion for 2D generative models, NeRF inversion not only needs to 1) preserve the identity of the input image, but also 2) ensure 3D consistency in generated novel views.
We propose a two-stage encoder for style-based NeRF inversion.
arXiv Detail & Related papers (2022-11-12T06:14:12Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - 3D-aware Image Synthesis via Learning Structural and Textural
Representations [39.681030539374994]
We propose VolumeGAN, for high-fidelity 3D-aware image synthesis, through explicitly learning a structural representation and a textural representation.
Our approach achieves sufficiently higher image quality and better 3D control than the previous methods.
arXiv Detail & Related papers (2021-12-20T18:59:40Z) - Decomposing 3D Scenes into Objects via Unsupervised Volume Segmentation [26.868351498722884]
We present ObSuRF, a method which turns a single image of a scene into a 3D model represented as a set of Neural Radiance Fields (NeRFs)
We make learning more computationally efficient by deriving a novel loss, which allows training NeRFs on RGB-D inputs without explicit ray marching.
arXiv Detail & Related papers (2021-04-02T16:59:29Z) - pixelNeRF: Neural Radiance Fields from One or Few Images [20.607712035278315]
pixelNeRF is a learning framework that predicts a continuous neural scene representation conditioned on one or few input images.
We conduct experiments on ShapeNet benchmarks for single image novel view synthesis tasks with held-out objects.
In all cases, pixelNeRF outperforms current state-of-the-art baselines for novel view synthesis and single image 3D reconstruction.
arXiv Detail & Related papers (2020-12-03T18:59:54Z) - Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from
Single and Multiple Images [56.652027072552606]
We propose a novel framework for single-view and multi-view 3D object reconstruction, named Pix2Vox++.
By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image.
A multi-scale context-aware fusion module is then introduced to adaptively select high-quality reconstructions for different parts from all coarse 3D volumes to obtain a fused 3D volume.
arXiv Detail & Related papers (2020-06-22T13:48:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.