From Route Instructions to Landmark Graphs
- URL: http://arxiv.org/abs/2002.02012v1
- Date: Wed, 5 Feb 2020 22:05:11 GMT
- Title: From Route Instructions to Landmark Graphs
- Authors: Christopher M Cervantes
- Abstract summary: Landmarks are central to how people navigate, but most navigation technologies do not incorporate them into their representations.
We propose the landmark graph generation task and introduce a fully end-to-end neural approach to generate these graphs.
- Score: 0.30458514384586394
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Landmarks are central to how people navigate, but most navigation
technologies do not incorporate them into their representations. We propose the
landmark graph generation task (creating landmark-based spatial representations
from natural language) and introduce a fully end-to-end neural approach to
generate these graphs. We evaluate our models on the SAIL route instruction
dataset, as well as on a small set of real-world delivery instructions that we
collected, and we show that our approach yields high quality results on both
our task and the related robotic navigation task.
Related papers
- PRET: Planning with Directed Fidelity Trajectory for Vision and Language Navigation [30.710806048991923]
Vision and language navigation is a task that requires an agent to navigate according to a natural language instruction.
Recent methods predict sub-goals on constructed topology map at each step to enable long-term action planning.
We propose an alternative method that facilitates navigation planning by considering the alignment between instructions and directed fidelity trajectories.
arXiv Detail & Related papers (2024-07-16T08:22:18Z) - Object Goal Navigation with Recursive Implicit Maps [92.6347010295396]
We propose an implicit spatial map for object goal navigation.
Our method significantly outperforms the state of the art on the challenging MP3D dataset.
We deploy our model on a real robot and achieve encouraging object goal navigation results in real scenes.
arXiv Detail & Related papers (2023-08-10T14:21:33Z) - Learning Navigational Visual Representations with Semantic Map
Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps.
Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z) - Unsupervised Task Graph Generation from Instructional Video Transcripts [53.54435048879365]
We consider a setting where text transcripts of instructional videos performing a real-world activity are provided.
The goal is to identify the key steps relevant to the task as well as the dependency relationship between these key steps.
We propose a novel task graph generation approach that combines the reasoning capabilities of instruction-tuned language models along with clustering and ranking components.
arXiv Detail & Related papers (2023-02-17T22:50:08Z) - Self-Supervised Road Layout Parsing with Graph Auto-Encoding [5.45914480139453]
We present a neural network approach that takes a road- map in bird's eye view as input, and predicts a human-interpretable graph that represents the road's topological layout.
Our approach elevates the understanding of road layouts from pixel level to the level of graphs.
arXiv Detail & Related papers (2022-03-21T14:14:26Z) - Lifelong Topological Visual Navigation [16.41858724205884]
We propose a learning-based visual navigation method with graph update strategies that improve lifelong navigation performance over time.
We take inspiration from sampling-based planning algorithms to build image-based topological graphs, resulting in sparser graphs yet with higher navigation performance compared to baseline methods.
Unlike controllers that learn from fixed training environments, we show that our model can be finetuned using a relatively small dataset from the real-world environment where the robot is deployed.
arXiv Detail & Related papers (2021-10-16T06:16:14Z) - Rapid Exploration for Open-World Navigation with Latent Goal Models [78.45339342966196]
We describe a robotic learning system for autonomous exploration and navigation in diverse, open-world environments.
At the core of our method is a learned latent variable model of distances and actions, along with a non-parametric topological memory of images.
We use an information bottleneck to regularize the learned policy, giving us (i) a compact visual representation of goals, (ii) improved generalization capabilities, and (iii) a mechanism for sampling feasible goals for exploration.
arXiv Detail & Related papers (2021-04-12T23:14:41Z) - SOON: Scenario Oriented Object Navigation with Graph-based Exploration [102.74649829684617]
The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the 'holy grail' goals of intelligent robots.
Most visual navigation benchmarks focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step.
This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere.
arXiv Detail & Related papers (2021-03-31T15:01:04Z) - Generating Landmark Navigation Instructions from Maps as a Graph-to-Text
Problem [15.99072005190786]
We present a neural model that takes OpenStreetMap representations as input and learns to generate navigation instructions.
Our work is based on a novel dataset of 7,672 crowd-sourced instances that have been verified by human navigation in Street View.
arXiv Detail & Related papers (2020-12-30T21:22:04Z) - Structured Landmark Detection via Topology-Adapting Deep Graph Learning [75.20602712947016]
We present a new topology-adapting deep graph learning approach for accurate anatomical facial and medical landmark detection.
The proposed method constructs graph signals leveraging both local image features and global shape features.
Experiments are conducted on three public facial image datasets (WFLW, 300W, and COFW-68) as well as three real-world X-ray medical datasets (Cephalometric (public), Hand and Pelvis)
arXiv Detail & Related papers (2020-04-17T11:55:03Z) - High-Level Plan for Behavioral Robot Navigation with Natural Language
Directions and R-NET [6.47137925955334]
We develop an understanding of the behavioral navigational graph to enable the pointer network to produce a sequence of behaviors representing the path.
Tests on the navigation graph dataset show that our model outperforms the state-of-the-art approach for both known and unknown environments.
arXiv Detail & Related papers (2020-01-08T01:14:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.