Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning
- URL: http://arxiv.org/abs/2506.23127v1
- Date: Sun, 29 Jun 2025 07:31:24 GMT
- Title: Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning
- Authors: Zhaoye Fei, Li Ji, Siyin Wang, Junhao Shi, Jingjing Gong, Xipeng Qiu,
- Abstract summary: Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks.<n>Existing approaches generate open-loop action scripts based on static knowledge.<n>We introduce Embodied Planner-R1, a novel outcome-driven reinforcement learning framework.
- Score: 41.67411509781136
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, yet they face significant challenges in embodied task planning scenarios that require continuous environmental understanding and action generation. Existing approaches generate open-loop action scripts based on static knowledge, making it difficult to learn causal relationships between actions and environmental feedback, particularly in partially observable environments. We introduce Embodied Planner-R1, a novel outcome-driven reinforcement learning framework that enables LLMs to develop interactive capabilities through autonomous exploration with minimal supervision. Our framework incorporates three key innovations: (1) Without human annotations, we employ pure reinforcement learning with group rollout, incorporating in-environment interaction through parallel exploration; (2) completion-driven sparse reward; and (3) Interactive Policy Optimization (IPO) for efficient learning from grouped trajectories. Across two challenging text-based Embodied planning benchmarks, Embodied Planner-R1 achieves impressive completion rates of 97.78% on ALFWorld and 79.92% on ScienceWorld, surpassing prior methods by a large margin, and suffers only a -3.66% drop in previously unseen environments, evidencing strong generalization.
Related papers
- EmbRACE-3K: Embodied Reasoning and Action in Complex Environments [48.32142591866083]
EmRACE-3K is a dataset of over 3,000 language-guided tasks constructed using Unreal Engine and the UnrealCV-Zoo framework.<n>We establish a benchmark to evaluate the embodied reasoning capabilities of VLMs across three key dimensions: Exploration, Dynamic Spatial-Semantic Reasoning, and Multi-stage Goal Execution.<n>In zero-shot settings, all models achieve success rates below 20%, underscoring the challenge posed by our benchmark and the current limitations of VLMs in interactive environments.
arXiv Detail & Related papers (2025-07-14T17:59:46Z) - Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search [48.348209577994865]
Large Language Models (LLMs) are increasingly capable but often require significant guidance or extensive interaction history to perform effectively in complex, interactive environments.<n>We introduce a novel LLM agent framework that enhances planning capabilities through in-context learning.<n>Our agent learns to extract task-critical atomic facts'' from its interaction trajectories.
arXiv Detail & Related papers (2025-06-10T18:36:31Z) - Don't Just Follow MLLM Plans: Robust and Efficient Planning for Open-world Agents [7.186226937530119]
We introduce Robust and Efficient Planning for Open-world Agents (REPOA), a novel framework designed to tackle these issues.<n> REPOA features three key components: adaptive dependency learning and fine-grained failure-aware operation memory to enhance robustness to knowledge inaccuracies.<n>Our evaluation in two established open-world testbeds demonstrates REPOA's robust and efficient planning, showcasing its capability to successfully obtain challenging late-game items.
arXiv Detail & Related papers (2025-05-30T03:01:44Z) - Words as Beacons: Guiding RL Agents with High-Level Language Prompts [6.7236795813629]
Large Language Models (LLMs) as "teachers" guide the agent's learning process by decomposing complex tasks into subgoals.
LLMs can provide subgoals to accomplish the task defined for the environment in a similar fashion to how a human would do.
It is possible to query the LLM only during the training phase, enabling agents to operate within the environment without any LLM intervention.
arXiv Detail & Related papers (2024-10-11T08:54:45Z) - AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation [81.32722475387364]
Large Language Model-based agents have garnered significant attention and are becoming increasingly popular.<n>Planning ability is a crucial component of an LLM-based agent, which generally entails achieving a desired goal from an initial state.<n>Recent studies have demonstrated that utilizing expert-level trajectory for instruction-tuning LLMs effectively enhances their planning capabilities.
arXiv Detail & Related papers (2024-08-01T17:59:46Z) - Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts [10.929547354171723]
This paper introduces Knowledgeable Agents from Language Model Rollouts (KALM)
It extracts knowledge from large language models (LLMs) in the form of imaginary rollouts that can be easily learned by the agent through offline reinforcement learning methods.
It achieves a success rate of 46% in executing tasks with unseen goals, substantially surpassing the 26% success rate achieved by baseline methods.
arXiv Detail & Related papers (2024-04-14T13:19:40Z) - From Summary to Action: Enhancing Large Language Models for Complex
Tasks with Open World APIs [62.496139001509114]
We introduce a novel tool invocation pipeline designed to control massive real-world APIs.
This pipeline mirrors the human task-solving process, addressing complicated real-life user queries.
Empirical evaluations of our Sum2Act pipeline on the ToolBench benchmark show significant performance improvements.
arXiv Detail & Related papers (2024-02-28T08:42:23Z) - Improving Open Information Extraction with Large Language Models: A
Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text.
Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z) - Efficient Learning of High Level Plans from Play [57.29562823883257]
We present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL.
We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks.
arXiv Detail & Related papers (2023-03-16T20:09:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.