LLaNA: Large Language and NeRF Assistant
- URL: http://arxiv.org/abs/2406.11840v2
- Date: Fri, 22 Nov 2024 10:38:42 GMT
- Title: LLaNA: Large Language and NeRF Assistant
- Authors: Andrea Amaduzzi, Pierluigi Zama Ramirez, Giuseppe Lisanti, Samuele Salti, Luigi Di Stefano,
- Abstract summary: We create LLaNA, first general-purpose NeRF-language assistant capable of performing new tasks such as NeRF captioning.
We build a dataset of NeRFs with text annotations for various NeRF-language tasks with no human intervention.
Results show that processing NeRF weights performs favourably against extracting 2D or 3D representations from NeRFs.
- Score: 17.774826745566784
- License:
- Abstract: Multimodal Large Language Models (MLLMs) have demonstrated an excellent understanding of images and 3D data. However, both modalities have shortcomings in holistically capturing the appearance and geometry of objects. Meanwhile, Neural Radiance Fields (NeRFs), which encode information within the weights of a simple Multi-Layer Perceptron (MLP), have emerged as an increasingly widespread modality that simultaneously encodes the geometry and photorealistic appearance of objects. This paper investigates the feasibility and effectiveness of ingesting NeRF into MLLM. We create LLaNA, the first general-purpose NeRF-language assistant capable of performing new tasks such as NeRF captioning and Q\&A. Notably, our method directly processes the weights of the NeRF's MLP to extract information about the represented objects without the need to render images or materialize 3D data structures. Moreover, we build a dataset of NeRFs with text annotations for various NeRF-language tasks with no human intervention. Based on this dataset, we develop a benchmark to evaluate the NeRF understanding capability of our method. Results show that processing NeRF weights performs favourably against extracting 2D or 3D representations from NeRFs.
Related papers
- NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields [57.617972778377215]
We show how to generate effective 3D representations from posed RGB images.
We pretrain this representation at scale on our proposed curated posed-RGB data, totaling over 1.8 million images.
Our novel self-supervised pretraining for NeRFs, NeRF-MAE, scales remarkably well and improves performance on various challenging 3D tasks.
arXiv Detail & Related papers (2024-04-01T17:59:55Z) - Obj-NeRF: Extract Object NeRFs from Multi-view Images [7.669778218573394]
We propose -NeRF, a comprehensive pipeline that recovers the 3D geometry of a specific object from multi-view images using a single prompt.
We also apply -NeRF to various applications, including object removal, rotation, replacement, and recoloring.
arXiv Detail & Related papers (2023-11-26T13:15:37Z) - RePaint-NeRF: NeRF Editting via Semantic Masks and Diffusion Models [36.236190350126826]
We propose a novel framework that can take RGB images as input and alter the 3D content in neural scenes.
Specifically, we semantically select the target object and a pre-trained diffusion model will guide the NeRF model to generate new 3D objects.
Experiment results show that our algorithm is effective for editing 3D objects in NeRF under different text prompts.
arXiv Detail & Related papers (2023-06-09T04:49:31Z) - MultiPlaneNeRF: Neural Radiance Field with Non-Trainable Representation [11.049528513775968]
NeRF is a popular model that efficiently represents 3D objects from 2D images.
We present MultiPlaneNeRF -- a model that simultaneously solves the above problems.
arXiv Detail & Related papers (2023-05-17T21:27:27Z) - Multi-Space Neural Radiance Fields [74.46513422075438]
Existing Neural Radiance Fields (NeRF) methods suffer from the existence of reflective objects.
We propose a multi-space neural radiance field (MS-NeRF) that represents the scene using a group of feature fields in parallel sub-spaces.
Our approach significantly outperforms the existing single-space NeRF methods for rendering high-quality scenes.
arXiv Detail & Related papers (2023-05-07T13:11:07Z) - FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation
Models [21.523836478458524]
Recent works on generalizable NeRFs have shown promising results on novel view synthesis from single or few images.
We propose a novel framework named FeatureNeRF to learn generalizable NeRFs by distilling pre-trained vision models.
Our experiments demonstrate the effectiveness of FeatureNeRF as a generalizable 3D semantic feature extractor.
arXiv Detail & Related papers (2023-03-22T17:57:01Z) - AligNeRF: High-Fidelity Neural Radiance Fields via Alignment-Aware
Training [100.33713282611448]
We conduct the first pilot study on training NeRF with high-resolution data.
We propose the corresponding solutions, including marrying the multilayer perceptron with convolutional layers.
Our approach is nearly free without introducing obvious training/testing costs.
arXiv Detail & Related papers (2022-11-17T17:22:28Z) - NeRF-Loc: Transformer-Based Object Localization Within Neural Radiance
Fields [62.89785701659139]
We propose a transformer-based framework, NeRF-Loc, to extract 3D bounding boxes of objects in NeRF scenes.
NeRF-Loc takes a pre-trained NeRF model and camera view as input and produces labeled, oriented 3D bounding boxes of objects as output.
arXiv Detail & Related papers (2022-09-24T18:34:22Z) - Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance
Fields [49.41982694533966]
We introduce a new task, Semantic-to-NeRF translation, conditioned on one single-view semantic mask as input.
In particular, Sem2NeRF addresses the highly challenging task by encoding the semantic mask into the latent code that controls the 3D scene representation of a pretrained decoder.
We verify the efficacy of the proposed Sem2NeRF and demonstrate it outperforms several strong baselines on two benchmark datasets.
arXiv Detail & Related papers (2022-03-21T09:15:58Z) - iNeRF: Inverting Neural Radiance Fields for Pose Estimation [68.91325516370013]
We present iNeRF, a framework that performs mesh-free pose estimation by "inverting" a Neural RadianceField (NeRF)
NeRFs have been shown to be remarkably effective for the task of view synthesis.
arXiv Detail & Related papers (2020-12-10T18:36:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.