QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds
- URL: http://arxiv.org/abs/2406.16578v1
- Date: Mon, 24 Jun 2024 12:14:24 GMT
- Title: QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds
- Authors: Ye Wang, Yuting Mei, Sipeng Zheng, Qin Jin,
- Abstract summary: QuadrupedGPT is a versatile agent designed to master a broad range of complex tasks with agility comparable to that of a pet.
Our agent processes human command and environmental contexts using a large multimodal model (LMM)
It is equipped with problem-solving capabilities that enable it to decompose long-term goals into a sequence of executable subgoals.
- Score: 51.05639500325598
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While pets offer companionship, their limited intelligence restricts advanced reasoning and autonomous interaction with humans. Considering this, we propose QuadrupedGPT, a versatile agent designed to master a broad range of complex tasks with agility comparable to that of a pet. To achieve this goal, the primary challenges include: i) effectively leveraging multimodal observations for decision-making; ii) mastering agile control of locomotion and path planning; iii) developing advanced cognition to execute long-term objectives. QuadrupedGPT processes human command and environmental contexts using a large multimodal model (LMM). Empowered by its extensive knowledge base, our agent autonomously assigns appropriate parameters for adaptive locomotion policies and guides the agent in planning a safe but efficient path towards the goal, utilizing semantic-aware terrain analysis. Moreover, QuadrupedGPT is equipped with problem-solving capabilities that enable it to decompose long-term goals into a sequence of executable subgoals through high-level reasoning. Extensive experiments across various benchmarks confirm that QuadrupedGPT can adeptly handle multiple tasks with intricate instructions, demonstrating a significant step towards the versatile quadruped agents in open-ended worlds. Our website and codes can be found at https://quadruped-hub.github.io/Quadruped-GPT/.
Related papers
- Spatial Reasoning and Planning for Deep Embodied Agents [2.7195102129095003]
This thesis explores the development of data-driven techniques for spatial reasoning and planning tasks.
It focuses on enhancing learning efficiency, interpretability, and transferability across novel scenarios.
arXiv Detail & Related papers (2024-09-28T23:05:56Z) - Look Further Ahead: Testing the Limits of GPT-4 in Path Planning [9.461626534488117]
Large Language Models (LLMs) have shown impressive capabilities across a wide variety of tasks.
Our proposed benchmark systematically tests path-planning skills in complex settings.
We found that framing prompts as Python code and decomposing long trajectory tasks improve GPT-4's path planning effectiveness.
arXiv Detail & Related papers (2024-06-17T18:12:56Z) - MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception [53.20509532671891]
MP5 is an open-ended multimodal embodied system built upon the challenging Minecraft simulator.
It can decompose feasible sub-objectives, design sophisticated situation-aware plans, and perform embodied action control.
arXiv Detail & Related papers (2023-12-12T17:55:45Z) - MAgIC: Investigation of Large Language Model Powered Multi-Agent in
Cognition, Adaptability, Rationality and Collaboration [102.41118020705876]
Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing.
As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework.
This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z) - Multi-Level Compositional Reasoning for Interactive Instruction
Following [24.581542880280203]
Multi-level Compositional Reasoning Agent (MCR-Agent)
At the highest level, we infer a sequence of human-interpretable subgoals to be executed based on language instructions by a high-level policy composition controller.
At the middle level, we discriminatively control the agent's navigation by a master policy by alternating between a navigation policy and various independent interaction policies.
At the lowest level, we infer manipulation actions with the corresponding object masks using the appropriate interaction policy.
arXiv Detail & Related papers (2023-08-18T08:38:28Z) - AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot
Manipulation [50.737355245505334]
We propose a novel framework for learning high-level cognitive capabilities in robot manipulation tasks.
The resulting dataset AlphaBlock consists of 35 comprehensive high-level tasks of multi-step text plans and paired observation.
arXiv Detail & Related papers (2023-05-30T09:54:20Z) - Planning Goals for Exploration [22.047797646698527]
"Planning Exploratory Goals" (PEG) is a method that sets goals for each training episode to directly optimize an intrinsic exploration reward.
PEG learns world models and adapts sampling-based planning algorithms to "plan goal commands"
arXiv Detail & Related papers (2023-03-23T02:51:50Z) - Planning-oriented Autonomous Driving [60.93767791255728]
We argue that a favorable framework should be devised and optimized in pursuit of the ultimate goal, i.e., planning of the self-driving car.
We introduce Unified Autonomous Driving (UniAD), a comprehensive framework that incorporates full-stack driving tasks in one network.
arXiv Detail & Related papers (2022-12-20T10:47:53Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.