RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks
- URL: http://arxiv.org/abs/2311.15649v3
- Date: Fri, 13 Sep 2024 09:36:18 GMT
- Title: RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks
- Authors: Yaran Chen, Wenbo Cui, Yuanwen Chen, Mining Tan, Xinyao Zhang, Dongbin Zhao, He Wang,
- Abstract summary: Large Language Models (LLMs) in natural language processing have inspired efforts to use LLMs in complex robot planning.
We propose a RoboGPT agent for making embodied long-term decisions for daily tasks.
The proposed RoboGPT agent outperforms SOTA methods on the ALFRED daily tasks.
- Score: 13.29302304547683
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robotic agents must master common sense and long-term sequential decisions to solve daily tasks through natural language instruction. The developments in Large Language Models (LLMs) in natural language processing have inspired efforts to use LLMs in complex robot planning. Despite LLMs' great generalization and comprehension of instruction tasks, LLMs-generated task plans sometimes lack feasibility and correctness. To address the problem, we propose a RoboGPT agent\footnote{our code and dataset will be released soon} for making embodied long-term decisions for daily tasks, with two modules: 1) LLMs-based planning with re-plan to break the task into multiple sub-goals; 2) RoboSkill individually designed for sub-goals to learn better navigation and manipulation skills. The LLMs-based planning is enhanced with a new robotic dataset and re-plan, called RoboGPT. The new robotic dataset of 67k daily instruction tasks is gathered for fine-tuning the Llama model and obtaining RoboGPT. RoboGPT planner with strong generalization can plan hundreds of daily instruction tasks. Additionally, a low-computational Re-Plan module is designed to allow plans to flexibly adapt to the environment, thereby addressing the nomenclature diversity challenge. The proposed RoboGPT agent outperforms SOTA methods on the ALFRED daily tasks. Moreover, RoboGPT planner exceeds SOTA LLM-based planners like ChatGPT in task-planning rationality for hundreds of unseen daily tasks, and even other domain tasks, while keeping the large model's original broad application and generality.
Related papers
- ReLEP: A Novel Framework for Real-world Long-horizon Embodied Planning [7.668848364013772]
We present ReLEP, a framework for Real world Long-horizon Embodied Planning.
At its core lies a fine-tuned large vision language model that formulates plans as sequences of skill functions.
ReLEP can accomplish a wide range of daily tasks and outperforms other state-of-the-art baseline methods.
arXiv Detail & Related papers (2024-09-24T01:47:23Z) - Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning [94.76546523689113]
We introduce CodePlan, a framework that generates and follows textcode-form plans -- pseudocode that outlines high-level, structured reasoning processes.
CodePlan effectively captures the rich semantics and control flows inherent to sophisticated reasoning tasks.
It achieves a 25.1% relative improvement compared with directly generating responses.
arXiv Detail & Related papers (2024-09-19T04:13:58Z) - Scaling Up Natural Language Understanding for Multi-Robots Through the Lens of Hierarchy [8.180994118420053]
Long-horizon planning is hindered by challenges such as uncertainty accumulation, computational complexity, delayed rewards and incomplete information.
This work proposes an approach to exploit the task hierarchy from human instructions to facilitate multi-robot planning.
arXiv Detail & Related papers (2024-08-15T14:46:13Z) - Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks [50.27313829438866]
Plan-Seq-Learn (PSL) is a modular approach that uses motion planning to bridge the gap between abstract language and learned low-level control.
PSL achieves success rates of over 85%, out-performing language-based, classical, and end-to-end approaches.
arXiv Detail & Related papers (2024-05-02T17:59:31Z) - DELTA: Decomposed Efficient Long-Term Robot Task Planning using Large Language Models [5.385540718118656]
We introduce DELTA, a novel task planning approach based on Large Language Models (LLMs)
By using scene graphs as environment representations within LLMs, DELTA achieves rapid generation of precise planning problem descriptions.
We show that DELTA enables an efficient and fully automatic task planning pipeline, achieving higher planning success rates and significantly shorter planning times compared to the state of the art.
arXiv Detail & Related papers (2024-04-04T07:59:24Z) - Consolidating Trees of Robotic Plans Generated Using Large Language
Models to Improve Reliability [6.4111574364474215]
The inherent probabilistic nature of Large Language Models (LLMs) introduces an element of unpredictability.
This paper introduces an innovative approach aims to generate correct and optimal robotic task plans for diverse real-world demands and scenarios.
arXiv Detail & Related papers (2024-01-15T18:01:59Z) - Learning adaptive planning representations with natural language
guidance [90.24449752926866]
This paper describes Ada, a framework for automatically constructing task-specific planning representations.
Ada interactively learns a library of planner-compatible high-level action abstractions and low-level controllers adapted to a particular domain of planning tasks.
arXiv Detail & Related papers (2023-12-13T23:35:31Z) - Embodied Task Planning with Large Language Models [86.63533340293361]
We propose a TAsk Planing Agent (TaPA) in embodied tasks for grounded planning with physical scene constraint.
During inference, we discover the objects in the scene by extending open-vocabulary object detectors to multi-view RGB images collected in different achievable locations.
Experimental results show that the generated plan from our TaPA framework can achieve higher success rate than LLaVA and GPT-3.5 by a sizable margin.
arXiv Detail & Related papers (2023-07-04T17:58:25Z) - Learning to Reason over Scene Graphs: A Case Study of Finetuning GPT-2
into a Robot Language Model for Grounded Task Planning [45.51792981370957]
We investigate the applicability of a smaller class of large language models (LLMs) in robotic task planning by learning to decompose tasks into subgoal specifications for a planner to execute sequentially.
Our method grounds the input of the LLM on the domain that is represented as a scene graph, enabling it to translate human requests into executable robot plans.
Our findings suggest that the knowledge stored in an LLM can be effectively grounded to perform long-horizon task planning, demonstrating the promising potential for the future application of neuro-symbolic planning methods in robotics.
arXiv Detail & Related papers (2023-05-12T18:14:32Z) - Plan, Eliminate, and Track -- Language Models are Good Teachers for
Embodied Agents [99.17668730578586]
Pre-trained large language models (LLMs) capture procedural knowledge about the world.
Plan, Eliminate, and Track (PET) framework translates a task description into a list of high-level sub-tasks.
PET framework leads to a significant 15% improvement over SOTA for generalization to human goal specifications.
arXiv Detail & Related papers (2023-05-03T20:11:22Z) - Learning to Plan with Natural Language [111.76828049344839]
Large Language Models (LLMs) have shown remarkable performance in various basic natural language tasks.
For completing the complex task, we still need a plan for the task to guide LLMs to generate the specific solutions step by step.
We propose the Learning to Plan method, which involves two phases: (1) In the first learning task plan phase, it iteratively updates the task plan with new step-by-step solutions and behavioral instructions, which are obtained by prompting LLMs to derive from training error feedback.
arXiv Detail & Related papers (2023-04-20T17:09:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.