LLM3:Large Language Model-based Task and Motion Planning with Motion Failure Reasoning
- URL: http://arxiv.org/abs/2403.11552v3
- Date: Wed, 21 Aug 2024 09:46:35 GMT
- Title: LLM3:Large Language Model-based Task and Motion Planning with Motion Failure Reasoning
- Authors: Shu Wang, Muzhi Han, Ziyuan Jiao, Zeyu Zhang, Ying Nian Wu, Song-Chun Zhu, Hangxin Liu,
- Abstract summary: Conventional Task and Motion Planning (TAMP) approaches rely on manually crafted interfaces connecting symbolic task planning with continuous motion generation.
Here, we present LLM3, a novel Large Language Model (LLM)-based TAMP framework featuring a domain-independent interface.
Specifically, we leverage the powerful reasoning and planning capabilities of pre-trained LLMs to propose symbolic action sequences and select continuous action parameters for motion planning.
- Score: 78.2390460278551
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional Task and Motion Planning (TAMP) approaches rely on manually crafted interfaces connecting symbolic task planning with continuous motion generation. These domain-specific and labor-intensive modules are limited in addressing emerging tasks in real-world settings. Here, we present LLM^3, a novel Large Language Model (LLM)-based TAMP framework featuring a domain-independent interface. Specifically, we leverage the powerful reasoning and planning capabilities of pre-trained LLMs to propose symbolic action sequences and select continuous action parameters for motion planning. Crucially, LLM^3 incorporates motion planning feedback through prompting, allowing the LLM to iteratively refine its proposals by reasoning about motion failure. Consequently, LLM^3 interfaces between task planning and motion planning, alleviating the intricate design process of handling domain-specific messages between them. Through a series of simulations in a box-packing domain, we quantitatively demonstrate the effectiveness of LLM^3 in solving TAMP problems and the efficiency in selecting action parameters. Ablation studies underscore the significant contribution of motion failure reasoning to the success of LLM^3. Furthermore, we conduct qualitative experiments on a physical manipulator, demonstrating the practical applicability of our approach in real-world settings.
Related papers
- Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.
However, they still struggle with problems requiring multi-step decision-making and environmental feedback.
We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - Zero-shot Robotic Manipulation with Language-guided Instruction and Formal Task Planning [16.89900521727246]
We propose an innovative language-guided symbolic task planning (LM-SymOpt) framework with optimization.
It is the first expert-free planning framework since we combine the world knowledge from Large Language Models with formal reasoning.
Our experimental results show that LM-SymOpt outperforms existing LLM-based planning approaches.
arXiv Detail & Related papers (2025-01-25T13:33:22Z) - PlanLLM: Video Procedure Planning with Refinable Large Language Models [5.371855090716962]
Video procedure planning, i.e., planning a sequence of action steps given the video frames of start and goal states, is an essential ability for embodied AI.
Recent works utilize Large Language Models (LLMs) to generate enriched action step description texts to guide action step decoding.
We propose PlanLLM, a cross-modal joint learning framework with LLMs for video procedure planning.
arXiv Detail & Related papers (2024-12-26T09:51:05Z) - MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation [52.739500459903724]
Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation.
We propose a novel multi-agent LLM framework that distributes high-level planning and low-level control code generation across specialized LLM agents.
We evaluate our approach on nine RLBench tasks, including long-horizon tasks, and demonstrate its ability to solve robotics manipulation in a zero-shot setting.
arXiv Detail & Related papers (2024-11-26T17:53:44Z) - Interactive and Expressive Code-Augmented Planning with Large Language Models [62.799579304821826]
Large Language Models (LLMs) demonstrate strong abilities in common-sense reasoning and interactive decision-making.
Recent techniques have sought to structure LLM outputs using control flow and other code-adjacent techniques to improve planning performance.
We propose REPL-Plan, an LLM planning approach that is fully code-expressive and dynamic.
arXiv Detail & Related papers (2024-11-21T04:23:17Z) - Learning Task Planning from Multi-Modal Demonstration for Multi-Stage Contact-Rich Manipulation [26.540648608911308]
In this paper, we introduce an in-context learning framework that incorporates tactile and force-torque information from human demonstrations.
We propose a bootstrapped reasoning pipeline that sequentially integrates each modality into a comprehensive task plan.
This task plan is then used as a reference for planning in new task configurations.
arXiv Detail & Related papers (2024-09-18T10:36:47Z) - Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs)
We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios.
We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z) - From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world.
Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting.
We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z) - Improving Planning with Large Language Models: A Modular Agentic Architecture [7.63815864256878]
Large language models (LLMs) often struggle with tasks that require multi-step reasoning or goal-directed planning.
We propose an agentic architecture, the Modular Agentic Planner (MAP), in which planning is accomplished via the recurrent interaction of specialized modules.
We find that MAP yields significant improvements over both standard LLM methods.
arXiv Detail & Related papers (2023-09-30T00:10:14Z) - Dynamic Planning with a LLM [15.430182858130884]
Large Language Models (LLMs) can solve many NLP tasks in zero-shot settings, but applications involving embodied agents remain problematic.
Our work presents LLM Dynamic Planner (LLM-DP), a neuro-symbolic framework where an LLM works hand-in-hand with a traditional planner to solve an embodied task.
arXiv Detail & Related papers (2023-08-11T21:17:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.