3D-Aware Encoding for Style-based Neural Radiance Fields
- URL: http://arxiv.org/abs/2211.06583v1
- Date: Sat, 12 Nov 2022 06:14:12 GMT
- Title: 3D-Aware Encoding for Style-based Neural Radiance Fields
- Authors: Yu-Jhe Li, Tao Xu, Bichen Wu, Ningyuan Zheng, Xiaoliang Dai, Albert
Pumarola, Peizhao Zhang, Peter Vajda, Kris Kitani
- Abstract summary: We learn an inversion function to project an input image to the latent space of a NeRF generator and then synthesize novel views of the original image based on the latent code.
Compared with GAN inversion for 2D generative models, NeRF inversion not only needs to 1) preserve the identity of the input image, but also 2) ensure 3D consistency in generated novel views.
We propose a two-stage encoder for style-based NeRF inversion.
- Score: 50.118687869198716
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We tackle the task of NeRF inversion for style-based neural radiance fields,
(e.g., StyleNeRF). In the task, we aim to learn an inversion function to
project an input image to the latent space of a NeRF generator and then
synthesize novel views of the original image based on the latent code. Compared
with GAN inversion for 2D generative models, NeRF inversion not only needs to
1) preserve the identity of the input image, but also 2) ensure 3D consistency
in generated novel views. This requires the latent code obtained from the
single-view image to be invariant across multiple views. To address this new
challenge, we propose a two-stage encoder for style-based NeRF inversion. In
the first stage, we introduce a base encoder that converts the input image to a
latent code. To ensure the latent code is view-invariant and is able to
synthesize 3D consistent novel view images, we utilize identity contrastive
learning to train the base encoder. Second, to better preserve the identity of
the input image, we introduce a refining encoder to refine the latent code and
add finer details to the output image. Importantly note that the novelty of
this model lies in the design of its first-stage encoder which produces the
closest latent code lying on the latent manifold and thus the refinement in the
second stage would be close to the NeRF manifold. Through extensive
experiments, we demonstrate that our proposed two-stage encoder qualitatively
and quantitatively exhibits superiority over the existing encoders for
inversion in both image reconstruction and novel-view rendering.
Related papers
- LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias [50.13457154615262]
We propose a transformer-based approach for scalable and generalizable novel view synthesis from sparse-view inputs.
We introduce two architectures: (1) an encoder-decoder LVSM, which encodes input image tokens into a fixed number of 1D latent tokens; and (2) a decoder-only LVSM, which directly maps input images to novel-view outputs.
arXiv Detail & Related papers (2024-10-22T17:58:28Z) - Dual Encoder GAN Inversion for High-Fidelity 3D Head Reconstruction from Single Images [8.558093666229553]
3D GAN inversion aims to project a single image into the latent space of a 3D Generative Adversarial Network (GAN)
There exist encoders that achieve good results in 3D GAN inversion, but they are predominantly built on EG3D.
We propose a novel framework built on PanoHead, which excels in synthesizing images from a 360-degree perspective.
arXiv Detail & Related papers (2024-09-30T17:30:23Z) - Triple-View Knowledge Distillation for Semi-Supervised Semantic
Segmentation [54.23510028456082]
We propose a Triple-view Knowledge Distillation framework, termed TriKD, for semi-supervised semantic segmentation.
The framework includes the triple-view encoder and the dual-frequency decoder.
arXiv Detail & Related papers (2023-09-22T01:02:21Z) - Meta-Auxiliary Network for 3D GAN Inversion [18.777352198191004]
In this work, we present a novel meta-auxiliary framework, while leveraging the newly developed 3D GANs as generator.
In the first stage, we invert the input image to an editable latent code using off-the-shelf inversion techniques.
The auxiliary network is proposed to refine the generator parameters with the given image as input, which both predicts offsets for weights of convolutional layers and sampling positions of volume rendering.
In the second stage, we perform meta-learning to fast adapt the auxiliary network to the input image, then the final reconstructed image is synthesized via the meta-learned auxiliary network.
arXiv Detail & Related papers (2023-05-18T11:26:27Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance
Fields [49.41982694533966]
We introduce a new task, Semantic-to-NeRF translation, conditioned on one single-view semantic mask as input.
In particular, Sem2NeRF addresses the highly challenging task by encoding the semantic mask into the latent code that controls the 3D scene representation of a pretrained decoder.
We verify the efficacy of the proposed Sem2NeRF and demonstrate it outperforms several strong baselines on two benchmark datasets.
arXiv Detail & Related papers (2022-03-21T09:15:58Z) - Pix2NeRF: Unsupervised Conditional $\pi$-GAN for Single Image to Neural
Radiance Fields Translation [93.77693306391059]
We propose a pipeline to generate Neural Radiance Fields(NeRF) of an object or a scene of a specific class, conditioned on a single input image.
Our method is based on $pi$-GAN, a generative model for unconditional 3D-aware image synthesis.
arXiv Detail & Related papers (2022-02-26T15:28:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.