Empowering Sparse-Input Neural Radiance Fields with Dual-Level Semantic Guidance from Dense Novel Views
- URL: http://arxiv.org/abs/2503.02230v1
- Date: Tue, 04 Mar 2025 03:13:44 GMT
- Title: Empowering Sparse-Input Neural Radiance Fields with Dual-Level Semantic Guidance from Dense Novel Views
- Authors: Yingji Zhong, Kaichen Zhou, Zhihao Li, Lanqing Hong, Zhenguo Li, Dan Xu,
- Abstract summary: We show that rendered semantics can be treated as a more robust form of augmented data than rendered RGB.<n>Our method enhances NeRF's performance by incorporating guidance derived from the rendered semantics.
- Score: 66.1245505423179
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural Radiance Fields (NeRF) have shown remarkable capabilities for photorealistic novel view synthesis. One major deficiency of NeRF is that dense inputs are typically required, and the rendering quality will drop drastically given sparse inputs. In this paper, we highlight the effectiveness of rendered semantics from dense novel views, and show that rendered semantics can be treated as a more robust form of augmented data than rendered RGB. Our method enhances NeRF's performance by incorporating guidance derived from the rendered semantics. The rendered semantic guidance encompasses two levels: the supervision level and the feature level. The supervision-level guidance incorporates a bi-directional verification module that decides the validity of each rendered semantic label, while the feature-level guidance integrates a learnable codebook that encodes semantic-aware information, which is queried by each point via the attention mechanism to obtain semantic-relevant predictions. The overall semantic guidance is embedded into a self-improved pipeline. We also introduce a more challenging sparse-input indoor benchmark, where the number of inputs is limited to as few as 6. Experiments demonstrate the effectiveness of our method and it exhibits superior performance compared to existing approaches.
Related papers
- "Principal Components" Enable A New Language of Images [79.45806370905775]
We introduce a novel visual tokenization framework that embeds a provable PCA-like structure into the latent token space.
Our approach achieves state-of-the-art reconstruction performance and enables better interpretability to align with the human vision system.
arXiv Detail & Related papers (2025-03-11T17:59:41Z) - DepthMaster: Taming Diffusion Models for Monocular Depth Estimation [41.81343543266191]
We propose a single-step diffusion model designed to adapt generative features for the discriminative depth estimation task.
We adopt a two-stage training strategy to fully leverage the potential of the two modules.
Our model achieves state-of-the-art performance in terms of generalization and detail preservation, outperforming other diffusion-based methods across various datasets.
arXiv Detail & Related papers (2025-01-05T15:18:32Z) - Enhancing Hyperspectral Image Prediction with Contrastive Learning in Low-Label Regime [0.810304644344495]
Self-supervised contrastive learning is an effective approach for addressing the challenge of limited labelled data.
We evaluate the method's performance for both the single-label and multi-label classification tasks.
arXiv Detail & Related papers (2024-10-10T10:20:16Z) - SEER-ZSL: Semantic Encoder-Enhanced Representations for Generalized Zero-Shot Learning [0.6792605600335813]
Zero-Shot Learning (ZSL) presents the challenge of identifying categories not seen during training.<n>We introduce a Semantic-Enhanced Representations for Zero-Shot Learning (SEER-ZSL)<n>First, we aim to distill meaningful semantic information using a probabilistic encoder, enhancing the semantic consistency and robustness.<n>Second, we distill the visual space by exploiting the learned data distribution through an adversarially trained generator. Third, we align the distilled information, enabling a mapping of unseen categories onto the true data manifold.
arXiv Detail & Related papers (2023-12-20T15:18:51Z) - 2D Feature Distillation for Weakly- and Semi-Supervised 3D Semantic
Segmentation [92.17700318483745]
We propose an image-guidance network (IGNet) which builds upon the idea of distilling high level feature information from a domain adapted synthetically trained 2D semantic segmentation network.
IGNet achieves state-of-the-art results for weakly-supervised LiDAR semantic segmentation on ScribbleKITTI, boasting up to 98% relative performance to fully supervised training with only 8% labeled points.
arXiv Detail & Related papers (2023-11-27T07:57:29Z) - GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding [101.32590239809113]
Generalized Perception NeRF (GP-NeRF) is a novel pipeline that makes the widely used segmentation model and NeRF work compatibly under a unified framework.
We propose two self-distillation mechanisms, i.e., the Semantic Distill Loss and the Depth-Guided Semantic Distill Loss, to enhance the discrimination and quality of the semantic field.
arXiv Detail & Related papers (2023-11-20T15:59:41Z) - DARF: Depth-Aware Generalizable Neural Radiance Field [51.29437249009986]
We propose the Depth-Aware Generalizable Neural Radiance Field (DARF) with a Depth-Aware Dynamic Sampling (DADS) strategy.<n>Our framework infers the unseen scenes on both pixel level and geometry level with only a few input images.<n>Compared with state-of-the-art generalizable NeRF methods, DARF reduces samples by 50%, while improving rendering quality and depth estimation.
arXiv Detail & Related papers (2022-12-05T14:00:59Z) - Activation to Saliency: Forming High-Quality Labels for Unsupervised
Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues.
No human annotations are involved in our framework during the whole training process.
Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z) - An audiovisual and contextual approach for categorical and continuous
emotion recognition in-the-wild [27.943550651941166]
We tackle the task of video-based audio-visual emotion recognition, within the premises of the 2nd Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW)
Standard methodologies that rely solely on the extraction of facial features often fall short of accurate emotion prediction in cases where the aforementioned source of affective information is inaccessible due to head/body orientation, low resolution and poor illumination.
We aspire to alleviate this problem by leveraging bodily as well as contextual features, as part of a broader emotion recognition framework.
arXiv Detail & Related papers (2021-07-07T20:13:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.