Waypoint Models for Instruction-guided Navigation in Continuous
Environments
- URL: http://arxiv.org/abs/2110.02207v1
- Date: Tue, 5 Oct 2021 17:55:49 GMT
- Title: Waypoint Models for Instruction-guided Navigation in Continuous
Environments
- Authors: Jacob Krantz, Aaron Gokaslan, Dhruv Batra, Stefan Lee, Oleksandr
Maksymets
- Abstract summary: We develop a class of language-conditioned waypoint prediction networks to examine this question.
We measure task performance and estimated execution time on a profiled LoCoBot robot.
Our models outperform prior work in VLN-CE and set a new state-of-the-art on the public leaderboard.
- Score: 68.2912740006109
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Little inquiry has explicitly addressed the role of action spaces in
language-guided visual navigation -- either in terms of its effect on
navigation success or the efficiency with which a robotic agent could execute
the resulting trajectory. Building on the recently released VLN-CE setting for
instruction following in continuous environments, we develop a class of
language-conditioned waypoint prediction networks to examine this question. We
vary the expressivity of these models to explore a spectrum between low-level
actions and continuous waypoint prediction. We measure task performance and
estimated execution time on a profiled LoCoBot robot. We find more expressive
models result in simpler, faster to execute trajectories, but lower-level
actions can achieve better navigation metrics by approximating shortest paths
better. Further, our models outperform prior work in VLN-CE and set a new
state-of-the-art on the public leaderboard -- increasing success rate by 4%
with our best model on this challenging task.
Related papers
- PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation [68.17081518640934]
We propose a PrIrmitive-driVen waypOinT-aware world model for Robotic manipulation (PIVOT-R)
PIVOT-R consists of a Waypoint-aware World Model (WAWM) and a lightweight action prediction module.
Our PIVOT-R outperforms state-of-the-art open-source models on the SeaWave benchmark, achieving an average relative improvement of 19.45% across four levels of instruction tasks.
arXiv Detail & Related papers (2024-10-14T11:30:18Z) - Narrowing the Gap between Vision and Action in Navigation [28.753809306008996]
We introduce a low-level action decoder jointly trained with high-level action prediction.
Our agent can improve navigation performance metrics compared to the strong baselines on both high-level and low-level actions.
arXiv Detail & Related papers (2024-08-19T20:09:56Z) - Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation [64.84996994779443]
We propose a novel Affordances-Oriented Planner for continuous vision-language navigation (VLN) task.
Our AO-Planner integrates various foundation models to achieve affordances-oriented low-level motion planning and high-level decision-making.
Experiments on the challenging R2R-CE and RxR-CE datasets show that AO-Planner achieves state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2024-07-08T12:52:46Z) - Mind the Gap: Improving Success Rate of Vision-and-Language Navigation
by Revisiting Oracle Success Routes [25.944819618283613]
Vision-and-Language Navigation (VLN) aims to navigate to the target location by following a given instruction.
We make the first attempt to tackle a long-ignored problem in VLN: narrowing the gap between Success Rate (SR) and Oracle Success Rate (OSR)
arXiv Detail & Related papers (2023-08-07T01:43:25Z) - ENTL: Embodied Navigation Trajectory Learner [37.43079415330256]
We propose a method for extracting long sequence representations for embodied navigation.
We train our model using vector-quantized predictions of future states conditioned on current actions.
A key property of our approach is that the model is pre-trained without any explicit reward signal.
arXiv Detail & Related papers (2023-04-05T17:58:33Z) - Can an Embodied Agent Find Your "Cat-shaped Mug"? LLM-Guided Exploration
for Zero-Shot Object Navigation [58.3480730643517]
We present LGX, a novel algorithm for Language-Driven Zero-Shot Object Goal Navigation (L-ZSON)
Our approach makes use of Large Language Models (LLMs) for this task.
We achieve state-of-the-art zero-shot object navigation results on RoboTHOR with a success rate (SR) improvement of over 27% over the current baseline.
arXiv Detail & Related papers (2023-03-06T20:19:19Z) - Bridging the Gap Between Learning in Discrete and Continuous
Environments for Vision-and-Language Navigation [41.334731014665316]
Most existing works in vision-and-language navigation (VLN) focus on either discrete or continuous environments.
We propose a predictor to generate a set of candidate waypoints during navigation.
We show that agents navigating in continuous environments with predicted waypoints perform significantly better than agents using low-level actions.
arXiv Detail & Related papers (2022-03-05T14:56:14Z) - Model-Based Reinforcement Learning via Latent-Space Collocation [110.04005442935828]
We argue that it is easier to solve long-horizon tasks by planning sequences of states rather than just actions.
We adapt the idea of collocation, which has shown good results on long-horizon tasks in optimal control literature, to the image-based setting by utilizing learned latent state space models.
arXiv Detail & Related papers (2021-06-24T17:59:18Z) - Success Weighted by Completion Time: A Dynamics-Aware Evaluation
Criteria for Embodied Navigation [42.978177196888225]
We present Success weighted by Completion Time (SCT), a new metric for evaluating navigation performance for mobile robots.
We also present RRT*-Unicycle, an algorithm for unicycle dynamics that estimates the fastest collision-free path and completion time.
arXiv Detail & Related papers (2021-03-14T20:13:06Z) - Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous
Environments [48.898567402373324]
We develop a language-guided navigation task set in a continuous 3D environment.
By being situated in continuous environments, this setting lifts a number of assumptions implicit in prior work.
arXiv Detail & Related papers (2020-04-06T17:49:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.