GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding
- URL: http://arxiv.org/abs/2311.11863v2
- Date: Sun, 7 Apr 2024 07:37:15 GMT
- Title: GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding
- Authors: Hao Li, Dingwen Zhang, Yalun Dai, Nian Liu, Lechao Cheng, Jingfeng Li, Jingdong Wang, Junwei Han,
- Abstract summary: Generalized Perception NeRF (GP-NeRF) is a novel pipeline that makes the widely used segmentation model and NeRF work compatibly under a unified framework.
We propose two self-distillation mechanisms, i.e., the Semantic Distill Loss and the Depth-Guided Semantic Distill Loss, to enhance the discrimination and quality of the semantic field.
- Score: 101.32590239809113
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Applying NeRF to downstream perception tasks for scene understanding and representation is becoming increasingly popular. Most existing methods treat semantic prediction as an additional rendering task, \textit{i.e.}, the "label rendering" task, to build semantic NeRFs. However, by rendering semantic/instance labels per pixel without considering the contextual information of the rendered image, these methods usually suffer from unclear boundary segmentation and abnormal segmentation of pixels within an object. To solve this problem, we propose Generalized Perception NeRF (GP-NeRF), a novel pipeline that makes the widely used segmentation model and NeRF work compatibly under a unified framework, for facilitating context-aware 3D scene perception. To accomplish this goal, we introduce transformers to aggregate radiance as well as semantic embedding fields jointly for novel views and facilitate the joint volumetric rendering of both fields. In addition, we propose two self-distillation mechanisms, i.e., the Semantic Distill Loss and the Depth-Guided Semantic Distill Loss, to enhance the discrimination and quality of the semantic field and the maintenance of geometric consistency. In evaluation, we conduct experimental comparisons under two perception tasks (\textit{i.e.} semantic and instance segmentation) using both synthetic and real-world datasets. Notably, our method outperforms SOTA approaches by 6.94\%, 11.76\%, and 8.47\% on generalized semantic segmentation, finetuning semantic segmentation, and instance segmentation, respectively.
Related papers
- Semantic Is Enough: Only Semantic Information For NeRF Reconstruction [12.156617601347769]
This research aims to extend the Semantic Neural Radiance Fields (Semantic-NeRF) model by focusing solely on semantic output.
We reformulate the model and its training procedure to leverage only the cross-entropy loss between the model semantic output and the ground truth semantic images.
arXiv Detail & Related papers (2024-03-24T07:04:08Z) - OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding [9.25233177676278]
OV-NeRF exploits potential of pre-trained vision and language foundation models to enhance semantic field learning.
Our approach achieves a significant improvement of 20.31% and 18.42% in mIoU metric on Replica and ScanNet, respectively.
arXiv Detail & Related papers (2024-02-07T08:19:57Z) - Edge Guided GANs with Multi-Scale Contrastive Learning for Semantic
Image Synthesis [139.2216271759332]
We propose a novel ECGAN for the challenging semantic image synthesis task.
The semantic labels do not provide detailed structural information, making it challenging to synthesize local details and structures.
The widely adopted CNN operations such as convolution, down-sampling, and normalization usually cause spatial resolution loss.
We propose a novel contrastive learning method, which aims to enforce pixel embeddings belonging to the same semantic class to generate more similar image content.
arXiv Detail & Related papers (2023-07-22T14:17:19Z) - JacobiNeRF: NeRF Shaping with Mutual Information Gradients [24.024577264160154]
We propose a method that trains a neural radiance field (NeRF) to encode semantic correlations between scene points, regions, or entities.
Our experiments show that a JacobiNeRF is more efficient in propagating annotations among 2D pixels and 3D points compared to NeRFs without mutual information shaping.
arXiv Detail & Related papers (2023-04-01T15:48:59Z) - Unsupervised Multi-View Object Segmentation Using Radiance Field
Propagation [55.9577535403381]
We present a novel approach to segmenting objects in 3D during reconstruction given only unlabeled multi-view images of a scene.
The core of our method is a novel propagation strategy for individual objects' radiance fields with a bidirectional photometric loss.
To the best of our knowledge, RFP is the first unsupervised approach for tackling 3D scene object segmentation for neural radiance field (NeRF)
arXiv Detail & Related papers (2022-10-02T11:14:23Z) - PeRFception: Perception using Radiance Fields [72.99583614735545]
We create the first large-scale implicit representation datasets for perception tasks, called the PeRFception.
It shows a significant memory compression rate (96.4%) from the original dataset, while containing both 2D and 3D information in a unified form.
We construct the classification and segmentation models that directly take as input this implicit format and also propose a novel augmentation technique to avoid overfitting on backgrounds of images.
arXiv Detail & Related papers (2022-08-24T13:32:46Z) - A Unified Architecture of Semantic Segmentation and Hierarchical
Generative Adversarial Networks for Expression Manipulation [52.911307452212256]
We develop a unified architecture of semantic segmentation and hierarchical GANs.
A unique advantage of our framework is that on forward pass the semantic segmentation network conditions the generative model.
We evaluate our method on two challenging facial expression translation benchmarks, AffectNet and RaFD, and a semantic segmentation benchmark, CelebAMask-HQ.
arXiv Detail & Related papers (2021-12-08T22:06:31Z) - In-Place Scene Labelling and Understanding with Implicit Scene
Representation [39.73806072862176]
We extend neural radiance fields (NeRF) to jointly encode semantics with appearance and geometry.
We show the benefit of this approach when labels are either sparse or very noisy in room-scale scenes.
arXiv Detail & Related papers (2021-03-29T18:30:55Z) - Edge Guided GANs with Contrastive Learning for Semantic Image Synthesis [194.1452124186117]
We propose a novel ECGAN for the challenging semantic image synthesis task.
Our ECGAN achieves significantly better results than state-of-the-art methods.
arXiv Detail & Related papers (2020-03-31T01:23:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.