NeRF Director: Revisiting View Selection in Neural Volume Rendering
- URL: http://arxiv.org/abs/2406.08839v1
- Date: Thu, 13 Jun 2024 06:04:19 GMT
- Title: NeRF Director: Revisiting View Selection in Neural Volume Rendering
- Authors: Wenhui Xiao, Rodrigo Santa Cruz, David Ahmedt-Aristizabal, Olivier Salvado, Clinton Fookes, Leo Lebrat,
- Abstract summary: We introduce a unified framework for view selection methods and devise a benchmark to assess its impact.
We show that high-quality renderings can be achieved faster by using fewer views.
We conduct extensive experiments on both synthetic datasets and realistic data to demonstrate the effectiveness of our proposed method.
- Score: 21.03892888687864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural Rendering representations have significantly contributed to the field of 3D computer vision. Given their potential, considerable efforts have been invested to improve their performance. Nonetheless, the essential question of selecting training views is yet to be thoroughly investigated. This key aspect plays a vital role in achieving high-quality results and aligns with the well-known tenet of deep learning: "garbage in, garbage out". In this paper, we first illustrate the importance of view selection by demonstrating how a simple rotation of the test views within the most pervasive NeRF dataset can lead to consequential shifts in the performance rankings of state-of-the-art techniques. To address this challenge, we introduce a unified framework for view selection methods and devise a thorough benchmark to assess its impact. Significant improvements can be achieved without leveraging error or uncertainty estimation but focusing on uniform view coverage of the reconstructed object, resulting in a training-free approach. Using this technique, we show that high-quality renderings can be achieved faster by using fewer views. We conduct extensive experiments on both synthetic datasets and realistic data to demonstrate the effectiveness of our proposed method compared with random, conventional error-based, and uncertainty-guided view selection.
Related papers
- Debiasing Graph Representation Learning based on Information Bottleneck [18.35405511009332]
We present the design and implementation of GRAFair, a new framework based on a variational graph auto-encoder.
The crux of GRAFair is the Conditional Fairness Bottleneck, where the objective is to capture the trade-off between the utility of representations and sensitive information of interest.
Experiments on various real-world datasets demonstrate the effectiveness of our proposed method in terms of fairness, utility, robustness, and stability.
arXiv Detail & Related papers (2024-09-02T16:45:23Z) - Improving Neural Surface Reconstruction with Feature Priors from Multi-View Image [87.00660347447494]
Recent advancements in Neural Surface Reconstruction (NSR) have significantly improved multi-view reconstruction when coupled with volume rendering.
We propose an investigation into feature-level consistent loss, aiming to harness valuable feature priors from diverse pretext visual tasks.
Our results, analyzed on DTU and EPFL, reveal that feature priors from image matching and multi-view stereo datasets outperform other pretext tasks.
arXiv Detail & Related papers (2024-08-04T16:09:46Z) - Debiasing Multimodal Large Language Models [61.6896704217147]
Large Vision-Language Models (LVLMs) have become indispensable tools in computer vision and natural language processing.
Our investigation reveals a noteworthy bias in the generated content, where the output is primarily influenced by the underlying Large Language Models (LLMs) prior to the input image.
To rectify these biases and redirect the model's focus toward vision information, we introduce two simple, training-free strategies.
arXiv Detail & Related papers (2024-03-08T12:35:07Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - Towards Robust and Accurate Visual Prompting [11.918195429308035]
We study whether a visual prompt derived from a robust model can inherit the robustness while suffering from the generalization performance decline.
We introduce a novel technique named Prompt Boundary Loose (PBL) to effectively mitigates the suboptimal results of visual prompt on standard accuracy.
Our findings are universal and demonstrate the significant benefits of our proposed method.
arXiv Detail & Related papers (2023-11-18T07:00:56Z) - Leveraging Neural Radiance Fields for Uncertainty-Aware Visual
Localization [56.95046107046027]
We propose to leverage Neural Radiance Fields (NeRF) to generate training samples for scene coordinate regression.
Despite NeRF's efficiency in rendering, many of the rendered data are polluted by artifacts or only contain minimal information gain.
arXiv Detail & Related papers (2023-10-10T20:11:13Z) - Stable and Causal Inference for Discriminative Self-supervised Deep
Visual Representations [10.41003719027387]
We analyze discriminative self-supervised methods from a causal perspective to explain unstable behaviors.
Our solutions involve tempering a linear transformation with controlled synthetic data.
arXiv Detail & Related papers (2023-08-16T12:30:17Z) - Improving Viewpoint Robustness for Visual Recognition via Adversarial
Training [26.824940629150362]
We propose Viewpoint-Invariant Adversarial Training (VIAT) to improve the viewpoint robustness of image classifiers.
We show that VIAT significantly improves the viewpoint robustness of various image classifiers based on the diversity of adversarial viewpoints generated by GMVFool.
arXiv Detail & Related papers (2023-07-21T12:18:35Z) - Efficient Self-supervised Vision Transformers for Representation
Learning [86.57557009109411]
We show that multi-stage architectures with sparse self-attentions can significantly reduce modeling complexity.
We propose a new pre-training task of region matching which allows the model to capture fine-grained region dependencies.
Our results show that combining the two techniques, EsViT achieves 81.3% top-1 on the ImageNet linear probe evaluation.
arXiv Detail & Related papers (2021-06-17T19:57:33Z) - Interpretable Detail-Fidelity Attention Network for Single Image
Super-Resolution [89.1947690981471]
We propose a purposeful and interpretable detail-fidelity attention network to progressively process smoothes and details in divide-and-conquer manner.
Particularly, we propose a Hessian filtering for interpretable feature representation which is high-profile for detail inference.
Experiments demonstrate that the proposed methods achieve superior performances over the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-28T08:31:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.