Waypoint Models for Instruction-guided Navigation in Continuous
Environments
- URL: http://arxiv.org/abs/2110.02207v1
- Date: Tue, 5 Oct 2021 17:55:49 GMT
- Title: Waypoint Models for Instruction-guided Navigation in Continuous
Environments
- Authors: Jacob Krantz, Aaron Gokaslan, Dhruv Batra, Stefan Lee, Oleksandr
Maksymets
- Abstract summary: We develop a class of language-conditioned waypoint prediction networks to examine this question.
We measure task performance and estimated execution time on a profiled LoCoBot robot.
Our models outperform prior work in VLN-CE and set a new state-of-the-art on the public leaderboard.
- Score: 68.2912740006109
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Little inquiry has explicitly addressed the role of action spaces in
language-guided visual navigation -- either in terms of its effect on
navigation success or the efficiency with which a robotic agent could execute
the resulting trajectory. Building on the recently released VLN-CE setting for
instruction following in continuous environments, we develop a class of
language-conditioned waypoint prediction networks to examine this question. We
vary the expressivity of these models to explore a spectrum between low-level
actions and continuous waypoint prediction. We measure task performance and
estimated execution time on a profiled LoCoBot robot. We find more expressive
models result in simpler, faster to execute trajectories, but lower-level
actions can achieve better navigation metrics by approximating shortest paths
better. Further, our models outperform prior work in VLN-CE and set a new
state-of-the-art on the public leaderboard -- increasing success rate by 4%
with our best model on this challenging task.
Related papers
- Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation [62.76017573929462]
LLM-based agents have demonstrated impressive zero-shot performance in the vision-language navigation (VLN) task.
We propose AO-Planner, a novel affordances-oriented planning framework for continuous VLN task.
Our method establishes an effective connection between LLM and 3D world to circumvent the difficulty of directly predicting world coordinates.
arXiv Detail & Related papers (2024-07-08T12:52:46Z) - NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning
Disentangled Reasoning [101.56342075720588]
Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions.
Recent research has highlighted the promising capacity of large language models (LLMs) in VLN by improving navigational reasoning accuracy and interpretability.
This paper introduces a novel strategy called Navigational Chain-of-Thought (NavCoT), where we fulfill parameter-efficient in-domain training to enable self-guided navigational decision.
arXiv Detail & Related papers (2024-03-12T07:27:02Z) - DICE: Diverse Diffusion Model with Scoring for Trajectory Prediction [7.346307332191997]
We present a novel framework that leverages diffusion models for predicting future trajectories in a computationally efficient manner.
We employ an efficient sampling mechanism that allows us to maximize the number of sampled trajectories for improved accuracy.
We show the effectiveness of our approach by conducting empirical evaluations on common pedestrian (UCY/ETH) and autonomous driving (nuScenes) benchmark datasets.
arXiv Detail & Related papers (2023-10-23T05:04:23Z) - Mind the Gap: Improving Success Rate of Vision-and-Language Navigation
by Revisiting Oracle Success Routes [25.944819618283613]
Vision-and-Language Navigation (VLN) aims to navigate to the target location by following a given instruction.
We make the first attempt to tackle a long-ignored problem in VLN: narrowing the gap between Success Rate (SR) and Oracle Success Rate (OSR)
arXiv Detail & Related papers (2023-08-07T01:43:25Z) - ENTL: Embodied Navigation Trajectory Learner [37.43079415330256]
We propose a method for extracting long sequence representations for embodied navigation.
We train our model using vector-quantized predictions of future states conditioned on current actions.
A key property of our approach is that the model is pre-trained without any explicit reward signal.
arXiv Detail & Related papers (2023-04-05T17:58:33Z) - Can an Embodied Agent Find Your "Cat-shaped Mug"? LLM-Guided Exploration
for Zero-Shot Object Navigation [58.3480730643517]
We present LGX, a novel algorithm for Language-Driven Zero-Shot Object Goal Navigation (L-ZSON)
Our approach makes use of Large Language Models (LLMs) for this task.
We achieve state-of-the-art zero-shot object navigation results on RoboTHOR with a success rate (SR) improvement of over 27% over the current baseline.
arXiv Detail & Related papers (2023-03-06T20:19:19Z) - Bridging the Gap Between Learning in Discrete and Continuous
Environments for Vision-and-Language Navigation [41.334731014665316]
Most existing works in vision-and-language navigation (VLN) focus on either discrete or continuous environments.
We propose a predictor to generate a set of candidate waypoints during navigation.
We show that agents navigating in continuous environments with predicted waypoints perform significantly better than agents using low-level actions.
arXiv Detail & Related papers (2022-03-05T14:56:14Z) - Model-Based Reinforcement Learning via Latent-Space Collocation [110.04005442935828]
We argue that it is easier to solve long-horizon tasks by planning sequences of states rather than just actions.
We adapt the idea of collocation, which has shown good results on long-horizon tasks in optimal control literature, to the image-based setting by utilizing learned latent state space models.
arXiv Detail & Related papers (2021-06-24T17:59:18Z) - Success Weighted by Completion Time: A Dynamics-Aware Evaluation
Criteria for Embodied Navigation [42.978177196888225]
We present Success weighted by Completion Time (SCT), a new metric for evaluating navigation performance for mobile robots.
We also present RRT*-Unicycle, an algorithm for unicycle dynamics that estimates the fastest collision-free path and completion time.
arXiv Detail & Related papers (2021-03-14T20:13:06Z) - Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous
Environments [48.898567402373324]
We develop a language-guided navigation task set in a continuous 3D environment.
By being situated in continuous environments, this setting lifts a number of assumptions implicit in prior work.
arXiv Detail & Related papers (2020-04-06T17:49:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.