Scaling LLaNA: Advancing NeRF-Language Understanding Through Large-Scale Training
- URL: http://arxiv.org/abs/2504.13995v1
- Date: Fri, 18 Apr 2025 18:00:00 GMT
- Title: Scaling LLaNA: Advancing NeRF-Language Understanding Through Large-Scale Training
- Authors: Andrea Amaduzzi, Pierluigi Zama Ramirez, Giuseppe Lisanti, Samuele Salti, Luigi Di Stefano,
- Abstract summary: We introduce LLaNA, the first MLLM able to perform new tasks such as NeRF captioning and Q&A.<n>We build the first large-scale NeRF-language dataset composed by more than 300K NeRFs trained on ShapeNet and.<n>We develop a benchmark to evaluate the NeRF understanding capability of our method.
- Score: 17.774826745566784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have shown remarkable capabilities in understanding both images and 3D data, yet these modalities face inherent limitations in comprehensively representing object geometry and appearance. Neural Radiance Fields (NeRFs) have emerged as a promising alternative, encoding both geometric and photorealistic properties within the weights of a simple Multi-Layer Perceptron (MLP). This work investigates the feasibility and effectiveness of ingesting NeRFs into an MLLM. We introduce LLaNA, the first MLLM able to perform new tasks such as NeRF captioning and Q\&A, by directly processing the weights of a NeRF's MLP. Notably, LLaNA is able to extract information about the represented objects without the need to render images or materialize 3D data structures. In addition, we build the first large-scale NeRF-language dataset, composed by more than 300K NeRFs trained on ShapeNet and Objaverse, with paired textual annotations that enable various NeRF-language tasks. Based on this dataset, we develop a benchmark to evaluate the NeRF understanding capability of our method. Results show that directly processing NeRF weights leads to better performance on NeRF-Language tasks compared to approaches that rely on either 2D or 3D representations derived from NeRFs.
Related papers
- LLaNA: Large Language and NeRF Assistant [17.774826745566784]
We create LLaNA, first general-purpose NeRF-language assistant capable of performing new tasks such as NeRF captioning.
We build a dataset of NeRFs with text annotations for various NeRF-language tasks with no human intervention.
Results show that processing NeRF weights performs favourably against extracting 2D or 3D representations from NeRFs.
arXiv Detail & Related papers (2024-06-17T17:59:59Z) - NeRF-DetS: Enhanced Adaptive Spatial-wise Sampling and View-wise Fusion Strategies for NeRF-based Indoor Multi-view 3D Object Detection [17.631688089207724]
In indoor scenes, the diverse distribution of object locations and scales makes the visual 3D perception task a big challenge.<n>Previous works have demonstrated that implicit representation has the capacity to benefit the visual 3D perception task.<n>We propose a simple yet effective method, NeRF-DetS, to address these issues.
arXiv Detail & Related papers (2024-04-22T06:59:03Z) - NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields [57.617972778377215]
We show how to generate effective 3D representations from posed RGB images.
We pretrain this representation at scale on our proposed curated posed-RGB data, totaling over 1.8 million images.
Our novel self-supervised pretraining for NeRFs, NeRF-MAE, scales remarkably well and improves performance on various challenging 3D tasks.
arXiv Detail & Related papers (2024-04-01T17:59:55Z) - Obj-NeRF: Extract Object NeRFs from Multi-view Images [7.669778218573394]
We propose -NeRF, a comprehensive pipeline that recovers the 3D geometry of a specific object from multi-view images using a single prompt.
We also apply -NeRF to various applications, including object removal, rotation, replacement, and recoloring.
arXiv Detail & Related papers (2023-11-26T13:15:37Z) - Registering Neural Radiance Fields as 3D Density Images [55.64859832225061]
We propose to use universal pre-trained neural networks that can be trained and tested on different scenes.
We demonstrate that our method, as a global approach, can effectively register NeRF models.
arXiv Detail & Related papers (2023-05-22T09:08:46Z) - MultiPlaneNeRF: Neural Radiance Field with Non-Trainable Representation [6.860380947025009]
NeRF is a popular model that efficiently represents 3D objects from 2D images.<n>We present MultiPlaneNeRF -- a model that simultaneously solves the above problems.
arXiv Detail & Related papers (2023-05-17T21:27:27Z) - LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields [112.62936571539232]
We introduce a new task, novel view synthesis for LiDAR sensors.
Traditional model-based LiDAR simulators with style-transfer neural networks can be applied to render novel views.
We use a neural radiance field (NeRF) to facilitate the joint learning of geometry and the attributes of 3D points.
arXiv Detail & Related papers (2023-04-20T15:44:37Z) - FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation
Models [21.523836478458524]
Recent works on generalizable NeRFs have shown promising results on novel view synthesis from single or few images.
We propose a novel framework named FeatureNeRF to learn generalizable NeRFs by distilling pre-trained vision models.
Our experiments demonstrate the effectiveness of FeatureNeRF as a generalizable 3D semantic feature extractor.
arXiv Detail & Related papers (2023-03-22T17:57:01Z) - AligNeRF: High-Fidelity Neural Radiance Fields via Alignment-Aware
Training [100.33713282611448]
We conduct the first pilot study on training NeRF with high-resolution data.
We propose the corresponding solutions, including marrying the multilayer perceptron with convolutional layers.
Our approach is nearly free without introducing obvious training/testing costs.
arXiv Detail & Related papers (2022-11-17T17:22:28Z) - Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance
Fields [49.41982694533966]
We introduce a new task, Semantic-to-NeRF translation, conditioned on one single-view semantic mask as input.
In particular, Sem2NeRF addresses the highly challenging task by encoding the semantic mask into the latent code that controls the 3D scene representation of a pretrained decoder.
We verify the efficacy of the proposed Sem2NeRF and demonstrate it outperforms several strong baselines on two benchmark datasets.
arXiv Detail & Related papers (2022-03-21T09:15:58Z) - iNeRF: Inverting Neural Radiance Fields for Pose Estimation [68.91325516370013]
We present iNeRF, a framework that performs mesh-free pose estimation by "inverting" a Neural RadianceField (NeRF)
NeRFs have been shown to be remarkably effective for the task of view synthesis.
arXiv Detail & Related papers (2020-12-10T18:36:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.