AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans
- URL: http://arxiv.org/abs/2411.18539v1
- Date: Wed, 27 Nov 2024 17:36:08 GMT
- Title: AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans
- Authors: Dillon Loh, Tomasz Bednarz, Xinxing Xia, Frank Guan,
- Abstract summary: We propose an extension to the task, termed Adaptive Visual Language Navigation (AdaVLN)<n>AdaVLN requires robots to navigate complex 3D indoor environments populated with dynamically moving human obstacles.<n>We evaluate several baseline models on this task, analyze the unique challenges introduced by AdaVLN, and demonstrate its potential to bridge the sim-to-real gap in VLN research.
- Score: 2.940962519388297
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual Language Navigation is a task that challenges robots to navigate in realistic environments based on natural language instructions. While previous research has largely focused on static settings, real-world navigation must often contend with dynamic human obstacles. Hence, we propose an extension to the task, termed Adaptive Visual Language Navigation (AdaVLN), which seeks to narrow this gap. AdaVLN requires robots to navigate complex 3D indoor environments populated with dynamically moving human obstacles, adding a layer of complexity to navigation tasks that mimic the real-world. To support exploration of this task, we also present AdaVLN simulator and AdaR2R datasets. The AdaVLN simulator enables easy inclusion of fully animated human models directly into common datasets like Matterport3D. We also introduce a "freeze-time" mechanism for both the navigation task and simulator, which pauses world state updates during agent inference, enabling fair comparisons and experimental reproducibility across different hardware. We evaluate several baseline models on this task, analyze the unique challenges introduced by AdaVLN, and demonstrate its potential to bridge the sim-to-real gap in VLN research.
Related papers
- HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard [63.54109142085327]
Vision-and-Language Navigation (VLN) systems often focus on either discrete (panoramic) or continuous (free-motion) paradigms alone.
We introduce a unified Human-Aware VLN benchmark that merges these paradigms under explicit social-awareness constraints.
arXiv Detail & Related papers (2025-03-18T13:05:55Z) - Ground-level Viewpoint Vision-and-Language Navigation in Continuous Environments [10.953629652228024]
Vision-and-Language Navigation (VLN) agents associate time-sequenced visual observations with corresponding instructions to make decisions.
In this paper, we address the mismatch between human-centric instructions and quadruped robots with a low-height field of view.
We propose a Ground-level Viewpoint Navigation (GVNav) approach to mitigate this issue.
arXiv Detail & Related papers (2025-02-26T10:30:40Z) - Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions [69.9980759344628]
Vision-and-Language Navigation (VLN) aims to develop embodied agents that navigate based on human instructions.
We introduce Human-Aware Vision-and-Language Navigation (HA-VLN), extending traditional VLN by incorporating dynamic human activities.
We present the Expert-Supervised Cross-Modal (VLN-CM) and Non-Expert-Supervised Decision Transformer (VLN-DT) agents, utilizing cross-modal fusion and diverse training strategies.
arXiv Detail & Related papers (2024-06-27T15:01:42Z) - AerialVLN: Vision-and-Language Navigation for UAVs [23.40363176320464]
We propose a new task named AerialVLN, which is UAV-based and towards outdoor environments.
We develop a 3D simulator rendered by near-realistic pictures of 25 city-level scenarios.
We find that there is still a significant gap between the baseline model and human performance, which suggests AerialVLN is a new challenging task.
arXiv Detail & Related papers (2023-08-13T09:55:04Z) - LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language,
Vision, and Action [76.71101507291473]
We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories.
We show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language modeling (GPT-3), without requiring any fine-tuning or language-annotated robot data.
arXiv Detail & Related papers (2022-07-10T10:41:50Z) - Image-based Navigation in Real-World Environments via Multiple Mid-level
Representations: Fusion Models, Benchmark and Efficient Evaluation [13.207579081178716]
In recent learning-based navigation approaches, the scene understanding and navigation abilities of the agent are achieved simultaneously.
Unfortunately, even if simulators represent an efficient tool to train navigation policies, the resulting models often fail when transferred into the real world.
One possible solution is to provide the navigation model with mid-level visual representations containing important domain-invariant properties of the scene.
arXiv Detail & Related papers (2022-02-02T15:00:44Z) - iGibson, a Simulation Environment for Interactive Tasks in Large
Realistic Scenes [54.04456391489063]
iGibson is a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes.
Our environment contains fifteen fully interactive home-sized scenes populated with rigid and articulated objects.
iGibson features enable the generalization of navigation agents, and that the human-iGibson interface and integrated motion planners facilitate efficient imitation learning of simple human demonstrated behaviors.
arXiv Detail & Related papers (2020-12-05T02:14:17Z) - Visual Navigation Among Humans with Optimal Control as a Supervisor [72.5188978268463]
We propose an approach that combines learning-based perception with model-based optimal control to navigate among humans.
Our approach is enabled by our novel data-generation tool, HumANav.
We demonstrate that the learned navigation policies can anticipate and react to humans without explicitly predicting future human motion.
arXiv Detail & Related papers (2020-03-20T16:13:47Z) - Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling [65.99956848461915]
Vision-and-Language Navigation (VLN) is a task where agents must decide how to move through a 3D environment to reach a goal.
One of the problems of the VLN task is data scarcity since it is difficult to collect enough navigation paths with human-annotated instructions for interactive environments.
We propose an adversarial-driven counterfactual reasoning model that can consider effective conditions instead of low-quality augmented data.
arXiv Detail & Related papers (2019-11-17T18:02:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.