Language-enhanced RNR-Map: Querying Renderable Neural Radiance Field
maps with natural language
- URL: http://arxiv.org/abs/2308.08854v1
- Date: Thu, 17 Aug 2023 08:27:01 GMT
- Title: Language-enhanced RNR-Map: Querying Renderable Neural Radiance Field
maps with natural language
- Authors: Francesco Taioli, Federico Cunico, Federico Girella, Riccardo Bologna,
Alessandro Farinelli, Marco Cristani
- Abstract summary: We present a Language-enhanced Renderable Neural Radiance map for Visual Navigation with natural language query prompts.
Le-RNR-Map employs a grid structure comprising latent codes positioned at each pixel.
We enhance RNR-Map with CLIP-based embedding latent codes, allowing natural language search without additional label data.
- Score: 51.805056586678184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Le-RNR-Map, a Language-enhanced Renderable Neural Radiance map for
Visual Navigation with natural language query prompts. The recently proposed
RNR-Map employs a grid structure comprising latent codes positioned at each
pixel. These latent codes, which are derived from image observation, enable: i)
image rendering given a camera pose, since they are converted to Neural
Radiance Field; ii) image navigation and localization with astonishing
accuracy. On top of this, we enhance RNR-Map with CLIP-based embedding latent
codes, allowing natural language search without additional label data. We
evaluate the effectiveness of this map in single and multi-object searches. We
also investigate its compatibility with a Large Language Model as an
"affordance query resolver". Code and videos are available at
https://intelligolabs.github.io/Le-RNR-Map/
Related papers
- Tag Map: A Text-Based Map for Spatial Reasoning and Navigation with Large Language Models [15.454856838083511]
Large Language Models (LLM) have emerged as a tool for robots to generate task plans using common sense reasoning.
Recent works have shifted from explicit maps with fixed semantic classes to implicit open vocabulary maps.
We propose an explicit text-based map that can represent thousands of semantic classes while easily integrating with LLMs.
arXiv Detail & Related papers (2024-09-23T18:26:19Z) - DisPlacing Objects: Improving Dynamic Vehicle Detection via Visual Place
Recognition under Adverse Conditions [29.828201168816243]
We investigate whether a prior map can be leveraged to aid in the detection of dynamic objects in a scene without the need for a 3D map.
We contribute an algorithm which refines an initial set of candidate object detections and produces a refined subset of highly accurate detections using a prior map.
arXiv Detail & Related papers (2023-06-30T10:46:51Z) - SNAP: Self-Supervised Neural Maps for Visual Positioning and Semantic
Understanding [57.108301842535894]
We introduce SNAP, a deep network that learns rich neural 2D maps from ground-level and overhead images.
We train our model to align neural maps estimated from different inputs, supervised only with camera poses over tens of millions of StreetView images.
SNAP can resolve the location of challenging image queries beyond the reach of traditional methods.
arXiv Detail & Related papers (2023-06-08T17:54:47Z) - GeoVLN: Learning Geometry-Enhanced Visual Representation with Slot
Attention for Vision-and-Language Navigation [52.65506307440127]
We propose GeoVLN, which learns Geometry-enhanced visual representation based on slot attention for robust Visual-and-Language Navigation.
We employ V&L BERT to learn a cross-modal representation that incorporate both language and vision informations.
arXiv Detail & Related papers (2023-05-26T17:15:22Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - Renderable Neural Radiance Map for Visual Navigation [18.903118231531973]
We propose a novel type of map for visual navigation, a renderable neural radiance map (RNR-Map)
The RNR-Map has a grid form and consists of latent codes at each pixel.
The recorded latent codes implicitly contain visual information about the environment, which makes the RNR-Map visually descriptive.
arXiv Detail & Related papers (2023-03-01T08:00:46Z) - HPointLoc: Point-based Indoor Place Recognition using Synthetic RGB-D
Images [58.720142291102135]
We present a novel dataset named as HPointLoc, specially designed for exploring capabilities of visual place recognition in indoor environment.
The dataset is based on the popular Habitat simulator, in which it is possible to generate indoor scenes using both own sensor data and open datasets.
arXiv Detail & Related papers (2022-12-30T12:20:56Z) - Visual Language Maps for Robot Navigation [30.33041779258644]
Grounding language to the visual observations of a navigating agent can be performed using off-the-shelf visual-language models pretrained on Internet-scale data.
We propose VLMaps, a spatial map representation that directly fuses pretrained visual-language features with a 3D reconstruction of the physical world.
arXiv Detail & Related papers (2022-10-11T18:13:20Z) - Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense
Spatiotemporal Grounding [75.03682706791389]
We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN) dataset.
RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and instructions) than other VLN datasets.
It emphasizes the role of language in VLN by addressing known biases in paths and eliciting more references to visible entities.
arXiv Detail & Related papers (2020-10-15T18:01:15Z) - Differentiable Mapping Networks: Learning Structured Map Representations
for Sparse Visual Localization [28.696160266177806]
Differentiable Mapping Network (DMN) learns effective map representations for visual localization.
We evaluate the DMN using simulated environments and a challenging real-world Street View dataset.
arXiv Detail & Related papers (2020-05-19T15:43:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.