Generating Landmark Navigation Instructions from Maps as a Graph-to-Text
Problem
- URL: http://arxiv.org/abs/2012.15329v1
- Date: Wed, 30 Dec 2020 21:22:04 GMT
- Title: Generating Landmark Navigation Instructions from Maps as a Graph-to-Text
Problem
- Authors: Raphael Schumann and Stefan Riezler
- Abstract summary: We present a neural model that takes OpenStreetMap representations as input and learns to generate navigation instructions.
Our work is based on a novel dataset of 7,672 crowd-sourced instances that have been verified by human navigation in Street View.
- Score: 15.99072005190786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Car-focused navigation services are based on turns and distances of named
streets, whereas navigation instructions naturally used by humans are centered
around physical objects called landmarks. We present a neural model that takes
OpenStreetMap representations as input and learns to generate navigation
instructions that contain visible and salient landmarks from human natural
language instructions. Routes on the map are encoded in a location- and
rotation-invariant graph representation that is decoded into natural language
instructions. Our work is based on a novel dataset of 7,672 crowd-sourced
instances that have been verified by human navigation in Street View. Our
evaluation shows that the navigation instructions generated by our system have
similar properties as human-generated instructions, and lead to successful
human navigation in Street View.
Related papers
- CityNav: Language-Goal Aerial Navigation Dataset with Geographic Information [25.51740922661166]
Vision-and-language navigation (VLN) aims to guide autonomous agents through real-world environments by integrating visual and linguistic cues.
We introduce CityNav, a novel dataset explicitly designed for language-guided aerial navigation in 3D environments of real cities.
CityNav comprises 32k natural language descriptions paired with human demonstration trajectories, collected via a newly developed web-based 3D simulator.
arXiv Detail & Related papers (2024-06-20T12:08:27Z) - Semantic Map-based Generation of Navigation Instructions [9.197756644049862]
We propose a new approach to navigation instruction generation by framing the problem as an image captioning task.
Conventional approaches employ a sequence of panorama images to generate navigation instructions.
We present a benchmark dataset for instruction generation using semantic maps, propose an initial model and ask human subjects to manually assess the quality of generated instructions.
arXiv Detail & Related papers (2024-03-28T17:27:44Z) - Object Goal Navigation with Recursive Implicit Maps [92.6347010295396]
We propose an implicit spatial map for object goal navigation.
Our method significantly outperforms the state of the art on the challenging MP3D dataset.
We deploy our model on a real robot and achieve encouraging object goal navigation results in real scenes.
arXiv Detail & Related papers (2023-08-10T14:21:33Z) - Learning Navigational Visual Representations with Semantic Map
Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps.
Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z) - Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language
Navigation [87.52136927091712]
We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions.
To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects.
We propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively.
arXiv Detail & Related papers (2022-10-14T04:23:27Z) - Gesture2Path: Imitation Learning for Gesture-aware Navigation [54.570943577423094]
We present Gesture2Path, a novel social navigation approach that combines image-based imitation learning with model-predictive control.
We deploy our method on real robots and showcase the effectiveness of our approach for the four gestures-navigation scenarios.
arXiv Detail & Related papers (2022-09-19T23:05:36Z) - Find a Way Forward: a Language-Guided Semantic Map Navigator [53.69229615952205]
This paper attacks the problem of language-guided navigation in a new perspective.
We use novel semantic navigation maps, which enables robots to carry out natural language instructions and move to a target position based on the map observations.
The proposed approach has noticeable performance gains, especially in long-distance navigation cases.
arXiv Detail & Related papers (2022-03-07T07:40:33Z) - Augmented reality navigation system for visual prosthesis [67.09251544230744]
We propose an augmented reality navigation system for visual prosthesis that incorporates a software of reactive navigation and path planning.
It consists on four steps: locating the subject on a map, planning the subject trajectory, showing it to the subject and re-planning without obstacles.
Results show how our augmented navigation system help navigation performance by reducing the time and distance to reach the goals, even significantly reducing the number of obstacles collisions.
arXiv Detail & Related papers (2021-09-30T09:41:40Z) - CrossMap Transformer: A Crossmodal Masked Path Transformer Using Double
Back-Translation for Vision-and-Language Navigation [11.318892271652695]
Navigation guided by natural language instructions is particularly suitable for Domestic Service Robots that interact naturally with users.
This task involves the prediction of a sequence of actions that leads to a specified destination given a natural language navigation instruction.
We propose the CrossMap Transformer network, which encodes the linguistic and visual features to sequentially generate a path.
arXiv Detail & Related papers (2021-03-01T09:03:50Z) - From Route Instructions to Landmark Graphs [0.30458514384586394]
Landmarks are central to how people navigate, but most navigation technologies do not incorporate them into their representations.
We propose the landmark graph generation task and introduce a fully end-to-end neural approach to generate these graphs.
arXiv Detail & Related papers (2020-02-05T22:05:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.