MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation
- URL: http://arxiv.org/abs/2401.07314v3
- Date: Thu, 20 Jun 2024 07:23:45 GMT
- Title: MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation
- Authors: Jiaqi Chen, Bingqian Lin, Ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong,
- Abstract summary: Embodied agents equipped with GPT have exhibited extraordinary decision-making and generalization abilities across various tasks.
We present a novel map-guided GPT-based agent, dubbed MapGPT, which introduces an online linguistic-formed map to encourage global exploration.
Benefiting from this design, we propose an adaptive planning mechanism to assist the agent in performing multi-step path planning based on a map.
- Score: 73.81268591484198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embodied agents equipped with GPT as their brains have exhibited extraordinary decision-making and generalization abilities across various tasks. However, existing zero-shot agents for vision-and-language navigation (VLN) only prompt GPT-4 to select potential locations within localized environments, without constructing an effective "global-view" for the agent to understand the overall environment. In this work, we present a novel map-guided GPT-based agent, dubbed MapGPT, which introduces an online linguistic-formed map to encourage global exploration. Specifically, we build an online map and incorporate it into the prompts that include node information and topological relationships, to help GPT understand the spatial environment. Benefiting from this design, we further propose an adaptive planning mechanism to assist the agent in performing multi-step path planning based on a map, systematically exploring multiple candidate nodes or sub-goals step by step. Extensive experiments demonstrate that our MapGPT is applicable to both GPT-4 and GPT-4V, achieving state-of-the-art zero-shot performance on R2R and REVERIE simultaneously (~10% and ~12% improvements in SR), and showcasing the newly emergent global thinking and path planning abilities of the GPT.
Related papers
- Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments [1.18749525824656]
Guide-LLM is a text-based agent designed to assist persons with visual impairments (PVI) in navigating large indoor environments.
Our approach features a novel text-based topological map that enables the LLM to plan global paths.
Simulated experiments demonstrate the system's efficacy in guiding PVI, underscoring its potential as a significant advancement in assistive technology.
arXiv Detail & Related papers (2024-10-28T01:58:21Z) - Look Further Ahead: Testing the Limits of GPT-4 in Path Planning [9.461626534488117]
Large Language Models (LLMs) have shown impressive capabilities across a wide variety of tasks.
Our proposed benchmark systematically tests path-planning skills in complex settings.
We found that framing prompts as Python code and decomposing long trajectory tasks improve GPT-4's path planning effectiveness.
arXiv Detail & Related papers (2024-06-17T18:12:56Z) - Core Building Blocks: Next Gen Geo Spatial GPT Application [0.0]
This paper introduces MapGPT, which aims to bridge the gap between natural language understanding and spatial data analysis.
MapGPT enables more accurate and contextually aware responses to location-based queries.
arXiv Detail & Related papers (2023-10-17T06:59:31Z) - GPT4GEO: How a Language Model Sees the World's Geography [31.215906518290883]
We investigate the degree to which GPT-4 has acquired factual geographic knowledge.
This knowledge is especially important for applications that involve geographic data.
We provide a broad characterisation of what GPT-4 knows about the world, highlighting both potentially surprising capabilities but also limitations.
arXiv Detail & Related papers (2023-05-30T18:28:04Z) - BEVBert: Multimodal Map Pre-training for Language-guided Navigation [75.23388288113817]
We propose a new map-based pre-training paradigm that is spatial-aware for use in vision-and-language navigation (VLN)
We build a local metric map to explicitly aggregate incomplete observations and remove duplicates, while modeling navigation dependency in a global topological map.
Based on the hybrid map, we devise a pre-training framework to learn a multimodal map representation, which enhances spatial-aware cross-modal reasoning thereby facilitating the language-guided navigation goal.
arXiv Detail & Related papers (2022-12-08T16:27:54Z) - Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language
Navigation [87.52136927091712]
We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions.
To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects.
We propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively.
arXiv Detail & Related papers (2022-10-14T04:23:27Z) - Think Global, Act Local: Dual-scale Graph Transformer for
Vision-and-Language Navigation [87.03299519917019]
We propose a dual-scale graph transformer (DUET) for joint long-term action planning and fine-grained cross-modal understanding.
We build a topological map on-the-fly to enable efficient exploration in global action space.
The proposed approach, DUET, significantly outperforms state-of-the-art methods on goal-oriented vision-and-language navigation benchmarks.
arXiv Detail & Related papers (2022-02-23T19:06:53Z) - ViKiNG: Vision-Based Kilometer-Scale Navigation with Geographic Hints [94.60414567852536]
Long-range navigation requires both planning and reasoning about local traversability.
We propose a learning-based approach that integrates learning and planning.
ViKiNG can leverage its image-based learned controller and goal-directed to navigate to goals up to 3 kilometers away.
arXiv Detail & Related papers (2022-02-23T02:14:23Z) - Differentiable Spatial Planning using Transformers [87.90709874369192]
We propose Spatial Planning Transformers (SPT), which given an obstacle map learns to generate actions by planning over long-range spatial dependencies.
In the setting where the ground truth map is not known to the agent, we leverage pre-trained SPTs in an end-to-end framework.
SPTs outperform prior state-of-the-art differentiable planners across all the setups for both manipulation and navigation tasks.
arXiv Detail & Related papers (2021-12-02T06:48:16Z) - UAV Path Planning using Global and Local Map Information with Deep
Reinforcement Learning [16.720630804675213]
This work presents a method for autonomous UAV path planning based on deep reinforcement learning (DRL)
We compare coverage path planning ( CPP), where the UAV's goal is to survey an area of interest to data harvesting (DH), where the UAV collects data from distributed Internet of Things (IoT) sensor devices.
By exploiting structured map information of the environment, we train double deep Q-networks (DDQNs) with identical architectures on both distinctly different mission scenarios.
arXiv Detail & Related papers (2020-10-14T09:59:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.