Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance
Fields
- URL: http://arxiv.org/abs/2203.10821v1
- Date: Mon, 21 Mar 2022 09:15:58 GMT
- Title: Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance
Fields
- Authors: Yuedong Chen, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham and Jianfei Cai
- Abstract summary: We introduce a new task, Semantic-to-NeRF translation, conditioned on one single-view semantic mask as input.
In particular, Sem2NeRF addresses the highly challenging task by encoding the semantic mask into the latent code that controls the 3D scene representation of a pretrained decoder.
We verify the efficacy of the proposed Sem2NeRF and demonstrate it outperforms several strong baselines on two benchmark datasets.
- Score: 49.41982694533966
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Image translation and manipulation have gain increasing attention along with
the rapid development of deep generative models. Although existing approaches
have brought impressive results, they mainly operated in 2D space. In light of
recent advances in NeRF-based 3D-aware generative models, we introduce a new
task, Semantic-to-NeRF translation, that aims to reconstruct a 3D scene
modelled by NeRF, conditioned on one single-view semantic mask as input. To
kick-off this novel task, we propose the Sem2NeRF framework. In particular,
Sem2NeRF addresses the highly challenging task by encoding the semantic mask
into the latent code that controls the 3D scene representation of a pretrained
decoder. To further improve the accuracy of the mapping, we integrate a new
region-aware learning strategy into the design of both the encoder and the
decoder. We verify the efficacy of the proposed Sem2NeRF and demonstrate that
it outperforms several strong baselines on two benchmark datasets.
Related papers
- DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - GenN2N: Generative NeRF2NeRF Translation [53.20986183316661]
GenN2N is a unified NeRF-to-NeRF translation framework for various NeRF translation tasks.
It employs a plug-and-play image-to-image translator to perform editing in the 2D domain and lifting 2D edits into the 3D NeRF space.
arXiv Detail & Related papers (2024-04-03T14:56:06Z) - NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields [57.617972778377215]
We show how to generate effective 3D representations from posed RGB images.
We pretrain this representation at scale on our proposed curated posed-RGB data, totaling over 1.8 million images.
Our novel self-supervised pretraining for NeRFs, NeRF-MAE, scales remarkably well and improves performance on various challenging 3D tasks.
arXiv Detail & Related papers (2024-04-01T17:59:55Z) - FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation
Models [21.523836478458524]
Recent works on generalizable NeRFs have shown promising results on novel view synthesis from single or few images.
We propose a novel framework named FeatureNeRF to learn generalizable NeRFs by distilling pre-trained vision models.
Our experiments demonstrate the effectiveness of FeatureNeRF as a generalizable 3D semantic feature extractor.
arXiv Detail & Related papers (2023-03-22T17:57:01Z) - 3D-Aware Encoding for Style-based Neural Radiance Fields [50.118687869198716]
We learn an inversion function to project an input image to the latent space of a NeRF generator and then synthesize novel views of the original image based on the latent code.
Compared with GAN inversion for 2D generative models, NeRF inversion not only needs to 1) preserve the identity of the input image, but also 2) ensure 3D consistency in generated novel views.
We propose a two-stage encoder for style-based NeRF inversion.
arXiv Detail & Related papers (2022-11-12T06:14:12Z) - CLONeR: Camera-Lidar Fusion for Occupancy Grid-aided Neural
Representations [77.90883737693325]
This paper proposes CLONeR, which significantly improves upon NeRF by allowing it to model large outdoor driving scenes observed from sparse input sensor views.
This is achieved by decoupling occupancy and color learning within the NeRF framework into separate Multi-Layer Perceptrons (MLPs) trained using LiDAR and camera data, respectively.
In addition, this paper proposes a novel method to build differentiable 3D Occupancy Grid Maps (OGM) alongside the NeRF model, and leverage this occupancy grid for improved sampling of points along a ray for rendering in metric space.
arXiv Detail & Related papers (2022-09-02T17:44:50Z) - 3D-aware Image Synthesis via Learning Structural and Textural
Representations [39.681030539374994]
We propose VolumeGAN, for high-fidelity 3D-aware image synthesis, through explicitly learning a structural representation and a textural representation.
Our approach achieves sufficiently higher image quality and better 3D control than the previous methods.
arXiv Detail & Related papers (2021-12-20T18:59:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.