Related papers: Integrating Intent Understanding and Optimal Behavior Planning for Behavior Tree Generation from Human Instructions

Integrating Intent Understanding and Optimal Behavior Planning for Behavior Tree Generation from Human Instructions

URL: http://arxiv.org/abs/2405.07474v2
Date: Thu, 27 Jun 2024 13:17:58 GMT
Title: Integrating Intent Understanding and Optimal Behavior Planning for Behavior Tree Generation from Human Instructions
Authors: Xinglin Chen, Yishuai Cai, Yunxin Mao, Minglong Li, Wenjing Yang, Weixia Xu, Ji Wang,
Abstract summary: Behavior Tree (BT) is an appropriate control architecture for robots executing tasks following human instructions. This paper proposes a two-stage framework for BT generation, which first employs large language models to interpret goals from high-level instructions. We represent goals as well-formed formulas in first-order logic, effectively bridging intent understanding and optimal behavior planning.
Score: 5.31484618181979
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Robots executing tasks following human instructions in domestic or industrial environments essentially require both adaptability and reliability. Behavior Tree (BT) emerges as an appropriate control architecture for these scenarios due to its modularity and reactivity. Existing BT generation methods, however, either do not involve interpreting natural language or cannot theoretically guarantee the BTs' success. This paper proposes a two-stage framework for BT generation, which first employs large language models (LLMs) to interpret goals from high-level instructions, then constructs an efficient goal-specific BT through the Optimal Behavior Tree Expansion Algorithm (OBTEA). We represent goals as well-formed formulas in first-order logic, effectively bridging intent understanding and optimal behavior planning. Experiments in the service robot validate the proficiency of LLMs in producing grammatically correct and accurately interpreted goals, demonstrate OBTEA's superiority over the baseline BT Expansion algorithm in various metrics, and finally confirm the practical deployability of our framework. The project website is https://dids-ei.github.io/Project/LLM-OBTEA/.

Related papers

A Temporal Planning Framework for Multi-Agent Systems via LLM-Aided Knowledge Base Management [5.548477348501636]
This paper presents a novel framework, called PLANTOR, that integrates Large Language Models (LLMs) with Prolog-based knowledge management and planning for multi-robot tasks. Results demonstrate that LLMs can produce accurate knowledge bases with modest human feedback, while Prolog guarantees formal correctness and explainability. This approach underscores the potential of LLM integration for advanced robotics tasks requiring flexible, scalable, and human-understandable planning.
arXiv Detail & Related papers (2025-02-26T13:51:28Z)
MRBTP: Efficient Multi-Robot Behavior Tree Planning and Collaboration [6.239895985962529]
Multi-robot task planning and collaboration are critical challenges in robotics. We propose the Multi-Robot Behavior Tree Planning (MRBTP) algorithm, with theoretical guarantees of both soundness and completeness. We then propose an optional plugin for MRBTP when Large Language Models (LLMs) are available to reason goal-related actions for each robot.
arXiv Detail & Related papers (2025-02-25T10:39:28Z)
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks [100.3234156027118]
We present VLABench, an open-source benchmark for evaluating universal LCM task learning. VLABench provides 100 carefully designed categories of tasks, with strong randomization in each category of task and a total of 2000+ objects. The benchmark assesses multiple competencies including understanding of mesh&texture, spatial relationship, semantic instruction, physical laws, knowledge transfer and reasoning.
arXiv Detail & Related papers (2024-12-24T06:03:42Z)
Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning [94.76546523689113]
We introduce CodePlan, a framework that generates and follows textcode-form plans -- pseudocode that outlines high-level, structured reasoning processes. CodePlan effectively captures the rich semantics and control flows inherent to sophisticated reasoning tasks. It achieves a 25.1% relative improvement compared with directly generating responses.
arXiv Detail & Related papers (2024-09-19T04:13:58Z)
Execution Semantics of Behavior Trees in Robotic Applications [0.8378438766517396]
This paper aims at defining the execution semantics of behavior trees (BTs) as used in robotics applications. We present an abstract data type that formalizes the structure and execution of BTs.
arXiv Detail & Related papers (2024-07-31T18:08:59Z)
HBTP: Heuristic Behavior Tree Planning with Large Language Model Reasoning [6.2560501421348]
Heuristic Behavior Tree Planning (HBTP) is a reliable and efficient framework for BT generation. This paper introduces the BT expansion process, along with two variants designed for optimal planning and satising planning. Experiments demonstrate the theoretical bounds of HBTP, and results from four datasets confirm its practical effectiveness in everyday service robot applications.
arXiv Detail & Related papers (2024-06-03T03:38:56Z)
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models [31.509994889286183]
We introduce Language Agent Tree Search (LATS) -- the first general framework that synergizes the capabilities of language models (LMs) in reasoning, acting, and planning. A key feature of our approach is the incorporation of an environment for external feedback, which offers a more deliberate and adaptive problem-solving mechanism. LATS achieves state-of-the-art pass@1 accuracy (92.7%) for programming on HumanEval with GPT-4 and demonstrates gradient-free performance (average score of 75.9) comparable to gradient-based fine-tuning for web navigation on WebShop with GPT
arXiv Detail & Related papers (2023-10-06T17:55:11Z)
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought [95.37585041654535]
Embodied AI is capable of planning and executing action sequences for robots to accomplish long-horizon tasks in physical environments. In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI. Experiments show the effectiveness of EmbodiedGPT on embodied tasks, including embodied planning, embodied control, visual captioning, and visual question answering.
arXiv Detail & Related papers (2023-05-24T11:04:30Z)
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model [63.66204449776262]
Instruct2Act is a framework that maps multi-modal instructions to sequential actions for robotic manipulation tasks. Our approach is adjustable and flexible in accommodating various instruction modalities and input types. Our zero-shot method outperformed many state-of-the-art learning-based policies in several tasks.
arXiv Detail & Related papers (2023-05-18T17:59:49Z)
Learning to Solve Voxel Building Embodied Tasks from Pixels and Natural Language Instructions [53.21504989297547]
We propose a new method that combines a language model and reinforcement learning for the task of building objects in a Minecraft-like environment. Our method first generates a set of consistently achievable sub-goals from the instructions and then completes associated sub-tasks with a pre-trained RL policy.
arXiv Detail & Related papers (2022-11-01T18:30:42Z)
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models [68.57918965060787]
Large language models (LLMs) can be used to score potential next actions during task planning. We present a programmatic LLM prompt structure that enables plan generation functional across situated environments.
arXiv Detail & Related papers (2022-09-22T20:29:49Z)
Active Inference and Behavior Trees for Reactive Action Planning and Execution in Robotics [2.040132783511305]
We propose a hybrid combination of active inference and behavior trees (BTs) for reactive action planning and execution in dynamic environments. The proposed approach allows to handle partially observable initial states and improves the robustness of classical BTs against unexpected contingencies.
arXiv Detail & Related papers (2020-11-19T10:24:41Z)
Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks. We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
Interpretable MTL from Heterogeneous Domains using Boosted Tree [8.095372074268685]
Multi-task learning (MTL) aims at improving the generalization performance of several related tasks. In this paper, following the philosophy of boosted tree, we proposed a two-stage method. Experiments on both benchmark and real-world datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2020-03-16T08:58:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.