Related papers: QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds

QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds

URL: http://arxiv.org/abs/2406.16578v2
Date: Tue, 03 Dec 2024 03:49:24 GMT
Title: QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds
Authors: Yuting Mei, Ye Wang, Sipeng Zheng, Qin Jin,
Abstract summary: We introduce QuadrupedGPT, designed to follow diverse commands with agility comparable to that of a pet.<n>Our agent shows proficiency in handling diverse tasks and intricate instructions, representing a significant step toward the development of versatile quadruped agents.
Score: 51.05639500325598
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As robotic agents increasingly assist humans in reality, quadruped robots offer unique opportunities for interaction in complex scenarios due to their agile movement. However, building agents that can autonomously navigate, adapt, and respond to versatile goals remains a significant challenge. In this work, we introduce QuadrupedGPT designed to follow diverse commands with agility comparable to that of a pet. The primary challenges addressed include: i) effectively utilizing multimodal observations for informed decision-making; ii) achieving agile control by integrating locomotion and navigation; iii) developing advanced cognition to execute long-term objectives. Our QuadrupedGPT interprets human commands and environmental contexts using a large multimodal model. Leveraging its extensive knowledge base, the agent autonomously assigns parameters for adaptive locomotion policies and devises safe yet efficient paths toward its goals. Additionally, it employs high-level reasoning to decompose long-term goals into a sequence of executable subgoals. Through comprehensive experiments, our agent shows proficiency in handling diverse tasks and intricate instructions, representing a significant step toward the development of versatile quadruped agents for open-ended environments.

Related papers

REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation [57.628771707989166]
We propose an adaptive multi-agent planning framework, termed REMAC, that enables efficient, scene-agnostic multi-robot long-horizon task planning and execution. ReMAC incorporates two key modules: a self-reflection module performing pre-conditions and post-condition checks in the loop to evaluate progress and refine plans, and a self-evolvement module dynamically adapting plans based on scene-specific reasoning.
arXiv Detail & Related papers (2025-03-28T03:51:40Z)
Learning Multi-Agent Loco-Manipulation for Long-Horizon Quadrupedal Pushing [33.689150109924526]
This paper tackles the task of obstacle-aware, long-horizon pushing by multiple quadrupedal robots. We propose a hierarchical multi-agent reinforcement learning framework with three levels of control.
arXiv Detail & Related papers (2024-11-11T16:27:25Z)
Spatial Reasoning and Planning for Deep Embodied Agents [2.7195102129095003]
This thesis explores the development of data-driven techniques for spatial reasoning and planning tasks. It focuses on enhancing learning efficiency, interpretability, and transferability across novel scenarios.
arXiv Detail & Related papers (2024-09-28T23:05:56Z)
Grounding Language Models in Autonomous Loco-manipulation Tasks [3.8363685417355557]
We propose a novel framework that learns, selects, and plans behaviors based on tasks in different scenarios. We leverage the planning and reasoning features of the large language model (LLM), constructing a hierarchical task graph. Experiments in simulation and real-world using the CENTAURO robot show that the language model based planner can efficiently adapt to new loco-manipulation tasks.
arXiv Detail & Related papers (2024-09-02T15:27:48Z)
Look Further Ahead: Testing the Limits of GPT-4 in Path Planning [9.461626534488117]
Large Language Models (LLMs) have shown impressive capabilities across a wide variety of tasks. Our proposed benchmark systematically tests path-planning skills in complex settings. We found that framing prompts as Python code and decomposing long trajectory tasks improve GPT-4's path planning effectiveness.
arXiv Detail & Related papers (2024-06-17T18:12:56Z)
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception [53.20509532671891]
MP5 is an open-ended multimodal embodied system built upon the challenging Minecraft simulator. It can decompose feasible sub-objectives, design sophisticated situation-aware plans, and perform embodied action control.
arXiv Detail & Related papers (2023-12-12T17:55:45Z)
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration [102.41118020705876]
Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing. As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework. This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z)
Learning Diverse Skills for Local Navigation under Multi-constraint Optimality [27.310655303502305]
In this work, we take a constrained optimization viewpoint on the quality-diversity trade-off. We show that we can obtain diverse policies while imposing constraints on their value functions which are defined through distinct rewards. Our trained policies transfer well to the real 12-DoF quadruped robot, Solo12.
arXiv Detail & Related papers (2023-10-03T21:21:21Z)
RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking [54.776890150458385]
We develop an efficient system for training universal agents capable of multi-task manipulation skills. We are able to train a single agent capable of 12 unique skills, and demonstrate its generalization over 38 tasks. On average, RoboAgent outperforms prior methods by over 40% in unseen situations.
arXiv Detail & Related papers (2023-09-05T03:14:39Z)
Multi-Level Compositional Reasoning for Interactive Instruction Following [24.581542880280203]
Multi-level Compositional Reasoning Agent (MCR-Agent) At the highest level, we infer a sequence of human-interpretable subgoals to be executed based on language instructions by a high-level policy composition controller. At the middle level, we discriminatively control the agent's navigation by a master policy by alternating between a navigation policy and various independent interaction policies. At the lowest level, we infer manipulation actions with the corresponding object masks using the appropriate interaction policy.
arXiv Detail & Related papers (2023-08-18T08:38:28Z)
AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation [50.737355245505334]
We propose a novel framework for learning high-level cognitive capabilities in robot manipulation tasks. The resulting dataset AlphaBlock consists of 35 comprehensive high-level tasks of multi-step text plans and paired observation.
arXiv Detail & Related papers (2023-05-30T09:54:20Z)
Robust and Versatile Bipedal Jumping Control through Reinforcement Learning [141.56016556936865]
This work aims to push the limits of agility for bipedal robots by enabling a torque-controlled bipedal robot to perform robust and versatile dynamic jumps in the real world. We present a reinforcement learning framework for training a robot to accomplish a large variety of jumping tasks, such as jumping to different locations and directions. We develop a new policy structure that encodes the robot's long-term input/output (I/O) history while also providing direct access to a short-term I/O history.
arXiv Detail & Related papers (2023-02-19T01:06:09Z)
Planning-oriented Autonomous Driving [60.93767791255728]
We argue that a favorable framework should be devised and optimized in pursuit of the ultimate goal, i.e., planning of the self-driving car. We introduce Unified Autonomous Driving (UniAD), a comprehensive framework that incorporates full-stack driving tasks in one network.
arXiv Detail & Related papers (2022-12-20T10:47:53Z)
Planning Immediate Landmarks of Targets for Model-Free Skill Transfer across Agents [34.56191646231944]
We propose PILoT, i.e., Planning Immediate Landmarks of Targets. PILoT learns a goal-conditioned state planner and distills a goal-planner to plan immediate landmarks in a model-free style. We show the power of PILoT on various transferring challenges, including few-shot transferring across action spaces and dynamics.
arXiv Detail & Related papers (2022-12-18T08:03:21Z)
Evolving Hierarchical Memory-Prediction Machines in Multi-Task Reinforcement Learning [4.030910640265943]
Behavioural agents must generalize across a variety of environments and objectives over time. We use genetic programming to evolve highly-generalized agents capable of operating in six unique environments from the control literature. We show that emergent hierarchical structure in the evolving programs leads to multi-task agents that succeed by performing a temporal decomposition and encoding of the problem environments in memory.
arXiv Detail & Related papers (2021-06-23T21:34:32Z)
Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors. In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency. We propose setting up an automatic curriculum for goals that the agent needs to solve. We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.