Related papers: Generating Landmark Navigation Instructions from Maps as a Graph-to-Text Problem

Generating Landmark Navigation Instructions from Maps as a Graph-to-Text Problem

URL: http://arxiv.org/abs/2012.15329v1
Date: Wed, 30 Dec 2020 21:22:04 GMT
Title: Generating Landmark Navigation Instructions from Maps as a Graph-to-Text Problem
Authors: Raphael Schumann and Stefan Riezler
Abstract summary: We present a neural model that takes OpenStreetMap representations as input and learns to generate navigation instructions. Our work is based on a novel dataset of 7,672 crowd-sourced instances that have been verified by human navigation in Street View.
Score: 15.99072005190786
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Car-focused navigation services are based on turns and distances of named streets, whereas navigation instructions naturally used by humans are centered around physical objects called landmarks. We present a neural model that takes OpenStreetMap representations as input and learns to generate navigation instructions that contain visible and salient landmarks from human natural language instructions. Routes on the map are encoded in a location- and rotation-invariant graph representation that is decoded into natural language instructions. Our work is based on a novel dataset of 7,672 crowd-sourced instances that have been verified by human navigation in Street View. Our evaluation shows that the navigation instructions generated by our system have similar properties as human-generated instructions, and lead to successful human navigation in Street View.

Related papers

Mobile Robot Navigation Using Hand-Drawn Maps: A Vision Language Model Approach [5.009635912655658]
Hand-drawn maps can often contain inaccuracies such as scale distortions and missing landmarks. This paper introduces a novel Hand-drawn Map Navigation (HAM-Nav) architecture that leverages pre-trained vision language models. Ham-Nav integrates a unique Selective Visual Association Prompting approach for topological map-based position estimation and navigation planning.
arXiv Detail & Related papers (2025-01-31T19:03:33Z)
NAVCON: A Cognitively Inspired and Linguistically Grounded Corpus for Vision and Language Navigation [66.89717229608358]
NAVCON is a large-scale annotated Vision-Language Navigation (VLN) corpus built on top of two popular datasets (R2R and RxR)
arXiv Detail & Related papers (2024-12-17T15:48:25Z)
CityNav: Language-Goal Aerial Navigation Dataset with Geographic Information [25.51740922661166]
Vision-and-language navigation (VLN) aims to guide autonomous agents through real-world environments by integrating visual and linguistic cues. We introduce CityNav, a novel dataset explicitly designed for language-guided aerial navigation in 3D environments of real cities. CityNav comprises 32k natural language descriptions paired with human demonstration trajectories, collected via a newly developed web-based 3D simulator.
arXiv Detail & Related papers (2024-06-20T12:08:27Z)
Semantic Map-based Generation of Navigation Instructions [9.197756644049862]
We propose a new approach to navigation instruction generation by framing the problem as an image captioning task. Conventional approaches employ a sequence of panorama images to generate navigation instructions. We present a benchmark dataset for instruction generation using semantic maps, propose an initial model and ask human subjects to manually assess the quality of generated instructions.
arXiv Detail & Related papers (2024-03-28T17:27:44Z)
Object Goal Navigation with Recursive Implicit Maps [92.6347010295396]
We propose an implicit spatial map for object goal navigation. Our method significantly outperforms the state of the art on the challenging MP3D dataset. We deploy our model on a real robot and achieve encouraging object goal navigation results in real scenes.
arXiv Detail & Related papers (2023-08-10T14:21:33Z)
Learning Navigational Visual Representations with Semantic Map Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps. Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z)
Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation [87.52136927091712]
We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions. To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects. We propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively.
arXiv Detail & Related papers (2022-10-14T04:23:27Z)
Gesture2Path: Imitation Learning for Gesture-aware Navigation [54.570943577423094]
We present Gesture2Path, a novel social navigation approach that combines image-based imitation learning with model-predictive control. We deploy our method on real robots and showcase the effectiveness of our approach for the four gestures-navigation scenarios.
arXiv Detail & Related papers (2022-09-19T23:05:36Z)
Find a Way Forward: a Language-Guided Semantic Map Navigator [53.69229615952205]
This paper attacks the problem of language-guided navigation in a new perspective. We use novel semantic navigation maps, which enables robots to carry out natural language instructions and move to a target position based on the map observations. The proposed approach has noticeable performance gains, especially in long-distance navigation cases.
arXiv Detail & Related papers (2022-03-07T07:40:33Z)
Augmented reality navigation system for visual prosthesis [67.09251544230744]
We propose an augmented reality navigation system for visual prosthesis that incorporates a software of reactive navigation and path planning. It consists on four steps: locating the subject on a map, planning the subject trajectory, showing it to the subject and re-planning without obstacles. Results show how our augmented navigation system help navigation performance by reducing the time and distance to reach the goals, even significantly reducing the number of obstacles collisions.
arXiv Detail & Related papers (2021-09-30T09:41:40Z)
CrossMap Transformer: A Crossmodal Masked Path Transformer Using Double Back-Translation for Vision-and-Language Navigation [11.318892271652695]
Navigation guided by natural language instructions is particularly suitable for Domestic Service Robots that interact naturally with users. This task involves the prediction of a sequence of actions that leads to a specified destination given a natural language navigation instruction. We propose the CrossMap Transformer network, which encodes the linguistic and visual features to sequentially generate a path.
arXiv Detail & Related papers (2021-03-01T09:03:50Z)
From Route Instructions to Landmark Graphs [0.30458514384586394]
Landmarks are central to how people navigate, but most navigation technologies do not incorporate them into their representations. We propose the landmark graph generation task and introduce a fully end-to-end neural approach to generate these graphs.
arXiv Detail & Related papers (2020-02-05T22:05:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.