QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds
- URL: http://arxiv.org/abs/2406.16578v1
- Date: Mon, 24 Jun 2024 12:14:24 GMT
- Title: QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds
- Authors: Ye Wang, Yuting Mei, Sipeng Zheng, Qin Jin,
- Abstract summary: QuadrupedGPT is a versatile agent designed to master a broad range of complex tasks with agility comparable to that of a pet.
Our agent processes human command and environmental contexts using a large multimodal model (LMM)
It is equipped with problem-solving capabilities that enable it to decompose long-term goals into a sequence of executable subgoals.
- Score: 51.05639500325598
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While pets offer companionship, their limited intelligence restricts advanced reasoning and autonomous interaction with humans. Considering this, we propose QuadrupedGPT, a versatile agent designed to master a broad range of complex tasks with agility comparable to that of a pet. To achieve this goal, the primary challenges include: i) effectively leveraging multimodal observations for decision-making; ii) mastering agile control of locomotion and path planning; iii) developing advanced cognition to execute long-term objectives. QuadrupedGPT processes human command and environmental contexts using a large multimodal model (LMM). Empowered by its extensive knowledge base, our agent autonomously assigns appropriate parameters for adaptive locomotion policies and guides the agent in planning a safe but efficient path towards the goal, utilizing semantic-aware terrain analysis. Moreover, QuadrupedGPT is equipped with problem-solving capabilities that enable it to decompose long-term goals into a sequence of executable subgoals through high-level reasoning. Extensive experiments across various benchmarks confirm that QuadrupedGPT can adeptly handle multiple tasks with intricate instructions, demonstrating a significant step towards the versatile quadruped agents in open-ended worlds. Our website and codes can be found at https://quadruped-hub.github.io/Quadruped-GPT/.
Related papers
- REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation [57.628771707989166]
We propose an adaptive multi-agent planning framework, termed REMAC, that enables efficient, scene-agnostic multi-robot long-horizon task planning and execution.
ReMAC incorporates two key modules: a self-reflection module performing pre-conditions and post-condition checks in the loop to evaluate progress and refine plans, and a self-evolvement module dynamically adapting plans based on scene-specific reasoning.
arXiv Detail & Related papers (2025-03-28T03:51:40Z) - Learning Multi-Agent Loco-Manipulation for Long-Horizon Quadrupedal Pushing [33.689150109924526]
This paper tackles the task of obstacle-aware, long-horizon pushing by multiple quadrupedal robots.
We propose a hierarchical multi-agent reinforcement learning framework with three levels of control.
arXiv Detail & Related papers (2024-11-11T16:27:25Z) - Spatial Reasoning and Planning for Deep Embodied Agents [2.7195102129095003]
This thesis explores the development of data-driven techniques for spatial reasoning and planning tasks.
It focuses on enhancing learning efficiency, interpretability, and transferability across novel scenarios.
arXiv Detail & Related papers (2024-09-28T23:05:56Z) - Grounding Language Models in Autonomous Loco-manipulation Tasks [3.8363685417355557]
We propose a novel framework that learns, selects, and plans behaviors based on tasks in different scenarios.
We leverage the planning and reasoning features of the large language model (LLM), constructing a hierarchical task graph.
Experiments in simulation and real-world using the CENTAURO robot show that the language model based planner can efficiently adapt to new loco-manipulation tasks.
arXiv Detail & Related papers (2024-09-02T15:27:48Z) - Look Further Ahead: Testing the Limits of GPT-4 in Path Planning [9.461626534488117]
Large Language Models (LLMs) have shown impressive capabilities across a wide variety of tasks.
Our proposed benchmark systematically tests path-planning skills in complex settings.
We found that framing prompts as Python code and decomposing long trajectory tasks improve GPT-4's path planning effectiveness.
arXiv Detail & Related papers (2024-06-17T18:12:56Z) - MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception [53.20509532671891]
MP5 is an open-ended multimodal embodied system built upon the challenging Minecraft simulator.
It can decompose feasible sub-objectives, design sophisticated situation-aware plans, and perform embodied action control.
arXiv Detail & Related papers (2023-12-12T17:55:45Z) - MAgIC: Investigation of Large Language Model Powered Multi-Agent in
Cognition, Adaptability, Rationality and Collaboration [102.41118020705876]
Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing.
As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework.
This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z) - Learning Diverse Skills for Local Navigation under Multi-constraint
Optimality [27.310655303502305]
In this work, we take a constrained optimization viewpoint on the quality-diversity trade-off.
We show that we can obtain diverse policies while imposing constraints on their value functions which are defined through distinct rewards.
Our trained policies transfer well to the real 12-DoF quadruped robot, Solo12.
arXiv Detail & Related papers (2023-10-03T21:21:21Z) - RoboAgent: Generalization and Efficiency in Robot Manipulation via
Semantic Augmentations and Action Chunking [54.776890150458385]
We develop an efficient system for training universal agents capable of multi-task manipulation skills.
We are able to train a single agent capable of 12 unique skills, and demonstrate its generalization over 38 tasks.
On average, RoboAgent outperforms prior methods by over 40% in unseen situations.
arXiv Detail & Related papers (2023-09-05T03:14:39Z) - Multi-Level Compositional Reasoning for Interactive Instruction
Following [24.581542880280203]
Multi-level Compositional Reasoning Agent (MCR-Agent)
At the highest level, we infer a sequence of human-interpretable subgoals to be executed based on language instructions by a high-level policy composition controller.
At the middle level, we discriminatively control the agent's navigation by a master policy by alternating between a navigation policy and various independent interaction policies.
At the lowest level, we infer manipulation actions with the corresponding object masks using the appropriate interaction policy.
arXiv Detail & Related papers (2023-08-18T08:38:28Z) - AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot
Manipulation [50.737355245505334]
We propose a novel framework for learning high-level cognitive capabilities in robot manipulation tasks.
The resulting dataset AlphaBlock consists of 35 comprehensive high-level tasks of multi-step text plans and paired observation.
arXiv Detail & Related papers (2023-05-30T09:54:20Z) - Robust and Versatile Bipedal Jumping Control through Reinforcement
Learning [141.56016556936865]
This work aims to push the limits of agility for bipedal robots by enabling a torque-controlled bipedal robot to perform robust and versatile dynamic jumps in the real world.
We present a reinforcement learning framework for training a robot to accomplish a large variety of jumping tasks, such as jumping to different locations and directions.
We develop a new policy structure that encodes the robot's long-term input/output (I/O) history while also providing direct access to a short-term I/O history.
arXiv Detail & Related papers (2023-02-19T01:06:09Z) - Planning-oriented Autonomous Driving [60.93767791255728]
We argue that a favorable framework should be devised and optimized in pursuit of the ultimate goal, i.e., planning of the self-driving car.
We introduce Unified Autonomous Driving (UniAD), a comprehensive framework that incorporates full-stack driving tasks in one network.
arXiv Detail & Related papers (2022-12-20T10:47:53Z) - Planning Immediate Landmarks of Targets for Model-Free Skill Transfer
across Agents [34.56191646231944]
We propose PILoT, i.e., Planning Immediate Landmarks of Targets.
PILoT learns a goal-conditioned state planner and distills a goal-planner to plan immediate landmarks in a model-free style.
We show the power of PILoT on various transferring challenges, including few-shot transferring across action spaces and dynamics.
arXiv Detail & Related papers (2022-12-18T08:03:21Z) - Evolving Hierarchical Memory-Prediction Machines in Multi-Task
Reinforcement Learning [4.030910640265943]
Behavioural agents must generalize across a variety of environments and objectives over time.
We use genetic programming to evolve highly-generalized agents capable of operating in six unique environments from the control literature.
We show that emergent hierarchical structure in the evolving programs leads to multi-task agents that succeed by performing a temporal decomposition and encoding of the problem environments in memory.
arXiv Detail & Related papers (2021-06-23T21:34:32Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.