Semantic Map-based Generation of Navigation Instructions
- URL: http://arxiv.org/abs/2403.19603v1
- Date: Thu, 28 Mar 2024 17:27:44 GMT
- Title: Semantic Map-based Generation of Navigation Instructions
- Authors: Chengzu Li, Chao Zhang, Simone Teufel, Rama Sanand Doddipatla, Svetlana Stoyanchev,
- Abstract summary: We propose a new approach to navigation instruction generation by framing the problem as an image captioning task.
Conventional approaches employ a sequence of panorama images to generate navigation instructions.
We present a benchmark dataset for instruction generation using semantic maps, propose an initial model and ask human subjects to manually assess the quality of generated instructions.
- Score: 9.197756644049862
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We are interested in the generation of navigation instructions, either in their own right or as training material for robotic navigation task. In this paper, we propose a new approach to navigation instruction generation by framing the problem as an image captioning task using semantic maps as visual input. Conventional approaches employ a sequence of panorama images to generate navigation instructions. Semantic maps abstract away from visual details and fuse the information in multiple panorama images into a single top-down representation, thereby reducing computational complexity to process the input. We present a benchmark dataset for instruction generation using semantic maps, propose an initial model and ask human subjects to manually assess the quality of generated instructions. Our initial investigations show promise in using semantic maps for instruction generation instead of a sequence of panorama images, but there is vast scope for improvement. We release the code for data preparation and model training at https://github.com/chengzu-li/VLGen.
Related papers
- Interactive Semantic Map Representation for Skill-based Visual Object
Navigation [43.71312386938849]
This paper introduces a new representation of a scene semantic map formed during the embodied agent interaction with the indoor environment.
We have implemented this representation into a full-fledged navigation approach called SkillTron.
The proposed approach makes it possible to form both intermediate goals for robot exploration and the final goal for object navigation.
arXiv Detail & Related papers (2023-11-07T16:30:12Z) - Object Goal Navigation with Recursive Implicit Maps [92.6347010295396]
We propose an implicit spatial map for object goal navigation.
Our method significantly outperforms the state of the art on the challenging MP3D dataset.
We deploy our model on a real robot and achieve encouraging object goal navigation results in real scenes.
arXiv Detail & Related papers (2023-08-10T14:21:33Z) - SNAP: Self-Supervised Neural Maps for Visual Positioning and Semantic
Understanding [57.108301842535894]
We introduce SNAP, a deep network that learns rich neural 2D maps from ground-level and overhead images.
We train our model to align neural maps estimated from different inputs, supervised only with camera poses over tens of millions of StreetView images.
SNAP can resolve the location of challenging image queries beyond the reach of traditional methods.
arXiv Detail & Related papers (2023-06-08T17:54:47Z) - Sketch-Guided Text-to-Image Diffusion Models [57.12095262189362]
We introduce a universal approach to guide a pretrained text-to-image diffusion model.
Our method does not require to train a dedicated model or a specialized encoder for the task.
We take a particular focus on the sketch-to-image translation task, revealing a robust and expressive way to generate images.
arXiv Detail & Related papers (2022-11-24T18:45:32Z) - Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language
Navigation [87.52136927091712]
We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions.
To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects.
We propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively.
arXiv Detail & Related papers (2022-10-14T04:23:27Z) - Lifelong Topological Visual Navigation [16.41858724205884]
We propose a learning-based visual navigation method with graph update strategies that improve lifelong navigation performance over time.
We take inspiration from sampling-based planning algorithms to build image-based topological graphs, resulting in sparser graphs yet with higher navigation performance compared to baseline methods.
Unlike controllers that learn from fixed training environments, we show that our model can be finetuned using a relatively small dataset from the real-world environment where the robot is deployed.
arXiv Detail & Related papers (2021-10-16T06:16:14Z) - Semantic Image Alignment for Vehicle Localization [111.59616433224662]
We present a novel approach to vehicle localization in dense semantic maps using semantic segmentation from a monocular camera.
In contrast to existing visual localization approaches, the system does not require additional keypoint features, handcrafted localization landmark extractors or expensive LiDAR sensors.
arXiv Detail & Related papers (2021-10-08T14:40:15Z) - Generating Landmark Navigation Instructions from Maps as a Graph-to-Text
Problem [15.99072005190786]
We present a neural model that takes OpenStreetMap representations as input and learns to generate navigation instructions.
Our work is based on a novel dataset of 7,672 crowd-sourced instances that have been verified by human navigation in Street View.
arXiv Detail & Related papers (2020-12-30T21:22:04Z) - Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation [143.6144560164782]
We introduce a learning-based approach for room navigation using semantic maps.
We train a model to generate amodal semantic top-down maps indicating beliefs of location, size, and shape of rooms.
Next, we use these maps to predict a point that lies in the target room and train a policy to navigate to the point.
arXiv Detail & Related papers (2020-07-20T02:19:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.