QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds
- URL: http://arxiv.org/abs/2406.16578v1
- Date: Mon, 24 Jun 2024 12:14:24 GMT
- Title: QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds
- Authors: Ye Wang, Yuting Mei, Sipeng Zheng, Qin Jin,
- Abstract summary: QuadrupedGPT is a versatile agent designed to master a broad range of complex tasks with agility comparable to that of a pet.
Our agent processes human command and environmental contexts using a large multimodal model (LMM)
It is equipped with problem-solving capabilities that enable it to decompose long-term goals into a sequence of executable subgoals.
- Score: 51.05639500325598
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While pets offer companionship, their limited intelligence restricts advanced reasoning and autonomous interaction with humans. Considering this, we propose QuadrupedGPT, a versatile agent designed to master a broad range of complex tasks with agility comparable to that of a pet. To achieve this goal, the primary challenges include: i) effectively leveraging multimodal observations for decision-making; ii) mastering agile control of locomotion and path planning; iii) developing advanced cognition to execute long-term objectives. QuadrupedGPT processes human command and environmental contexts using a large multimodal model (LMM). Empowered by its extensive knowledge base, our agent autonomously assigns appropriate parameters for adaptive locomotion policies and guides the agent in planning a safe but efficient path towards the goal, utilizing semantic-aware terrain analysis. Moreover, QuadrupedGPT is equipped with problem-solving capabilities that enable it to decompose long-term goals into a sequence of executable subgoals through high-level reasoning. Extensive experiments across various benchmarks confirm that QuadrupedGPT can adeptly handle multiple tasks with intricate instructions, demonstrating a significant step towards the versatile quadruped agents in open-ended worlds. Our website and codes can be found at https://quadruped-hub.github.io/Quadruped-GPT/.
Related papers
- Look Further Ahead: Testing the Limits of GPT-4 in Path Planning [9.461626534488117]
Large Language Models (LLMs) have shown impressive capabilities across a wide variety of tasks.
Our proposed benchmark systematically tests path-planning skills in complex settings.
We found that framing prompts as Python code and decomposing long trajectory tasks improve GPT-4's path planning effectiveness.
arXiv Detail & Related papers (2024-06-17T18:12:56Z) - MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception [53.20509532671891]
MP5 is an open-ended multimodal embodied system built upon the challenging Minecraft simulator.
It can decompose feasible sub-objectives, design sophisticated situation-aware plans, and perform embodied action control.
arXiv Detail & Related papers (2023-12-12T17:55:45Z) - MAgIC: Investigation of Large Language Model Powered Multi-Agent in
Cognition, Adaptability, Rationality and Collaboration [102.41118020705876]
Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing.
As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework.
This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z) - Octopus: Embodied Vision-Language Programmer from Environmental Feedback [59.772904419928054]
Large vision-language models (VLMs) have achieved substantial progress in multimodal perception and reasoning.
In this paper, we introduce Octopus, a novel VLM designed to proficiently decipher an agent's vision and textual task objectives.
Our design allows the agent to adeptly handle a wide spectrum of tasks, ranging from mundane daily chores in simulators to sophisticated interactions in complex video games.
arXiv Detail & Related papers (2023-10-12T17:59:58Z) - Multi-Level Compositional Reasoning for Interactive Instruction
Following [24.581542880280203]
Multi-level Compositional Reasoning Agent (MCR-Agent)
At the highest level, we infer a sequence of human-interpretable subgoals to be executed based on language instructions by a high-level policy composition controller.
At the middle level, we discriminatively control the agent's navigation by a master policy by alternating between a navigation policy and various independent interaction policies.
At the lowest level, we infer manipulation actions with the corresponding object masks using the appropriate interaction policy.
arXiv Detail & Related papers (2023-08-18T08:38:28Z) - AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot
Manipulation [50.737355245505334]
We propose a novel framework for learning high-level cognitive capabilities in robot manipulation tasks.
The resulting dataset AlphaBlock consists of 35 comprehensive high-level tasks of multi-step text plans and paired observation.
arXiv Detail & Related papers (2023-05-30T09:54:20Z) - Planning Goals for Exploration [22.047797646698527]
"Planning Exploratory Goals" (PEG) is a method that sets goals for each training episode to directly optimize an intrinsic exploration reward.
PEG learns world models and adapts sampling-based planning algorithms to "plan goal commands"
arXiv Detail & Related papers (2023-03-23T02:51:50Z) - Planning-oriented Autonomous Driving [60.93767791255728]
We argue that a favorable framework should be devised and optimized in pursuit of the ultimate goal, i.e., planning of the self-driving car.
We introduce Unified Autonomous Driving (UniAD), a comprehensive framework that incorporates full-stack driving tasks in one network.
arXiv Detail & Related papers (2022-12-20T10:47:53Z) - Planning Immediate Landmarks of Targets for Model-Free Skill Transfer
across Agents [34.56191646231944]
We propose PILoT, i.e., Planning Immediate Landmarks of Targets.
PILoT learns a goal-conditioned state planner and distills a goal-planner to plan immediate landmarks in a model-free style.
We show the power of PILoT on various transferring challenges, including few-shot transferring across action spaces and dynamics.
arXiv Detail & Related papers (2022-12-18T08:03:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.