Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance
Fields
- URL: http://arxiv.org/abs/2203.10821v1
- Date: Mon, 21 Mar 2022 09:15:58 GMT
- Title: Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance
Fields
- Authors: Yuedong Chen, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham and Jianfei Cai
- Abstract summary: We introduce a new task, Semantic-to-NeRF translation, conditioned on one single-view semantic mask as input.
In particular, Sem2NeRF addresses the highly challenging task by encoding the semantic mask into the latent code that controls the 3D scene representation of a pretrained decoder.
We verify the efficacy of the proposed Sem2NeRF and demonstrate it outperforms several strong baselines on two benchmark datasets.
- Score: 49.41982694533966
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Image translation and manipulation have gain increasing attention along with
the rapid development of deep generative models. Although existing approaches
have brought impressive results, they mainly operated in 2D space. In light of
recent advances in NeRF-based 3D-aware generative models, we introduce a new
task, Semantic-to-NeRF translation, that aims to reconstruct a 3D scene
modelled by NeRF, conditioned on one single-view semantic mask as input. To
kick-off this novel task, we propose the Sem2NeRF framework. In particular,
Sem2NeRF addresses the highly challenging task by encoding the semantic mask
into the latent code that controls the 3D scene representation of a pretrained
decoder. To further improve the accuracy of the mapping, we integrate a new
region-aware learning strategy into the design of both the encoder and the
decoder. We verify the efficacy of the proposed Sem2NeRF and demonstrate that
it outperforms several strong baselines on two benchmark datasets.
Related papers
- DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving.
It predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
It is trained self-supervised with differentiable rendering to reconstruct RGB, depth, or feature images.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - GenN2N: Generative NeRF2NeRF Translation [53.20986183316661]
GenN2N is a unified NeRF-to-NeRF translation framework for various NeRF translation tasks.
It employs a plug-and-play image-to-image translator to perform editing in the 2D domain and lifting 2D edits into the 3D NeRF space.
arXiv Detail & Related papers (2024-04-03T14:56:06Z) - NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields [57.617972778377215]
We show how to generate effective 3D representations from posed RGB images.
We pretrain this representation at scale on our proposed curated posed-RGB data, totaling over 1.8 million images.
Our novel self-supervised pretraining for NeRFs, NeRF-MAE, scales remarkably well and improves performance on various challenging 3D tasks.
arXiv Detail & Related papers (2024-04-01T17:59:55Z) - FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation
Models [21.523836478458524]
Recent works on generalizable NeRFs have shown promising results on novel view synthesis from single or few images.
We propose a novel framework named FeatureNeRF to learn generalizable NeRFs by distilling pre-trained vision models.
Our experiments demonstrate the effectiveness of FeatureNeRF as a generalizable 3D semantic feature extractor.
arXiv Detail & Related papers (2023-03-22T17:57:01Z) - HyperNeRFGAN: Hypernetwork approach to 3D NeRF GAN [3.479254848034425]
We propose a generative model called HyperNeRFGAN, which uses hypernetworks paradigm to produce 3D objects represented by NeRF.
Our architecture produces 2D images, but we use 3D-aware NeRF representation, which forces the model to produce correct 3D objects.
arXiv Detail & Related papers (2023-01-27T10:21:18Z) - 3D-Aware Encoding for Style-based Neural Radiance Fields [50.118687869198716]
We learn an inversion function to project an input image to the latent space of a NeRF generator and then synthesize novel views of the original image based on the latent code.
Compared with GAN inversion for 2D generative models, NeRF inversion not only needs to 1) preserve the identity of the input image, but also 2) ensure 3D consistency in generated novel views.
We propose a two-stage encoder for style-based NeRF inversion.
arXiv Detail & Related papers (2022-11-12T06:14:12Z) - CLONeR: Camera-Lidar Fusion for Occupancy Grid-aided Neural
Representations [77.90883737693325]
This paper proposes CLONeR, which significantly improves upon NeRF by allowing it to model large outdoor driving scenes observed from sparse input sensor views.
This is achieved by decoupling occupancy and color learning within the NeRF framework into separate Multi-Layer Perceptrons (MLPs) trained using LiDAR and camera data, respectively.
In addition, this paper proposes a novel method to build differentiable 3D Occupancy Grid Maps (OGM) alongside the NeRF model, and leverage this occupancy grid for improved sampling of points along a ray for rendering in metric space.
arXiv Detail & Related papers (2022-09-02T17:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.