Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous
Environments
- URL: http://arxiv.org/abs/2004.02857v2
- Date: Fri, 1 May 2020 18:06:55 GMT
- Title: Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous
Environments
- Authors: Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, Stefan Lee
- Abstract summary: We develop a language-guided navigation task set in a continuous 3D environment.
By being situated in continuous environments, this setting lifts a number of assumptions implicit in prior work.
- Score: 48.898567402373324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop a language-guided navigation task set in a continuous 3D
environment where agents must execute low-level actions to follow natural
language navigation directions. By being situated in continuous environments,
this setting lifts a number of assumptions implicit in prior work that
represents environments as a sparse graph of panoramas with edges corresponding
to navigability. Specifically, our setting drops the presumptions of known
environment topologies, short-range oracle navigation, and perfect agent
localization. To contextualize this new task, we develop models that mirror
many of the advances made in prior settings as well as single-modality
baselines. While some of these techniques transfer, we find significantly lower
absolute performance in the continuous setting -- suggesting that performance
in prior `navigation-graph' settings may be inflated by the strong implicit
assumptions.
Related papers
- UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation [71.97405667493477]
We introduce a novel, generalizable 3DGS-based pre-training paradigm, called UnitedVLN.
It enables agents to better explore future environments by unitedly rendering high-fidelity 360 visual images and semantic features.
UnitedVLN outperforms state-of-the-art methods on existing VLN-CE benchmarks.
arXiv Detail & Related papers (2024-11-25T02:44:59Z) - CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation [73.78984332354636]
CorNav is a novel zero-shot framework for vision-and-language navigation.
It incorporates environmental feedback for refining future plans and adjusting its actions.
It consistently outperforms all baselines in a zero-shot multi-task setting.
arXiv Detail & Related papers (2023-06-17T11:44:04Z) - Bridging the Gap Between Learning in Discrete and Continuous
Environments for Vision-and-Language Navigation [41.334731014665316]
Most existing works in vision-and-language navigation (VLN) focus on either discrete or continuous environments.
We propose a predictor to generate a set of candidate waypoints during navigation.
We show that agents navigating in continuous environments with predicted waypoints perform significantly better than agents using low-level actions.
arXiv Detail & Related papers (2022-03-05T14:56:14Z) - Waypoint Models for Instruction-guided Navigation in Continuous
Environments [68.2912740006109]
We develop a class of language-conditioned waypoint prediction networks to examine this question.
We measure task performance and estimated execution time on a profiled LoCoBot robot.
Our models outperform prior work in VLN-CE and set a new state-of-the-art on the public leaderboard.
arXiv Detail & Related papers (2021-10-05T17:55:49Z) - Occupancy Anticipation for Efficient Exploration and Navigation [97.17517060585875]
We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions.
By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment.
Our approach is the winning entry in the 2020 Habitat PointNav Challenge.
arXiv Detail & Related papers (2020-08-21T03:16:51Z) - Environment-agnostic Multitask Learning for Natural Language Grounded
Navigation [88.69873520186017]
We introduce a multitask navigation model that can be seamlessly trained on Vision-Language Navigation (VLN) and Navigation from Dialog History (NDH) tasks.
Experiments show that environment-agnostic multitask learning significantly reduces the performance gap between seen and unseen environments.
arXiv Detail & Related papers (2020-03-01T09:06:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.