Integrating Intent Understanding and Optimal Behavior Planning for Behavior Tree Generation from Human Instructions
- URL: http://arxiv.org/abs/2405.07474v2
- Date: Thu, 27 Jun 2024 13:17:58 GMT
- Title: Integrating Intent Understanding and Optimal Behavior Planning for Behavior Tree Generation from Human Instructions
- Authors: Xinglin Chen, Yishuai Cai, Yunxin Mao, Minglong Li, Wenjing Yang, Weixia Xu, Ji Wang,
- Abstract summary: Behavior Tree (BT) is an appropriate control architecture for robots executing tasks following human instructions.
This paper proposes a two-stage framework for BT generation, which first employs large language models to interpret goals from high-level instructions.
We represent goals as well-formed formulas in first-order logic, effectively bridging intent understanding and optimal behavior planning.
- Score: 5.31484618181979
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robots executing tasks following human instructions in domestic or industrial environments essentially require both adaptability and reliability. Behavior Tree (BT) emerges as an appropriate control architecture for these scenarios due to its modularity and reactivity. Existing BT generation methods, however, either do not involve interpreting natural language or cannot theoretically guarantee the BTs' success. This paper proposes a two-stage framework for BT generation, which first employs large language models (LLMs) to interpret goals from high-level instructions, then constructs an efficient goal-specific BT through the Optimal Behavior Tree Expansion Algorithm (OBTEA). We represent goals as well-formed formulas in first-order logic, effectively bridging intent understanding and optimal behavior planning. Experiments in the service robot validate the proficiency of LLMs in producing grammatically correct and accurately interpreted goals, demonstrate OBTEA's superiority over the baseline BT Expansion algorithm in various metrics, and finally confirm the practical deployability of our framework. The project website is https://dids-ei.github.io/Project/LLM-OBTEA/.
Related papers
- Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning [94.76546523689113]
We introduce CodePlan, a framework that generates and follows textcode-form plans -- pseudocode that outlines high-level, structured reasoning processes.
CodePlan effectively captures the rich semantics and control flows inherent to sophisticated reasoning tasks.
It achieves a 25.1% relative improvement compared with directly generating responses.
arXiv Detail & Related papers (2024-09-19T04:13:58Z) - HBTP: Heuristic Behavior Tree Planning with Large Language Model Reasoning [6.2560501421348]
Heuristic Behavior Tree Planning (HBTP) is a reliable and efficient framework for BT generation.
This paper introduces the BT expansion process, along with two variants designed for optimal planning and satising planning.
Experiments demonstrate the theoretical bounds of HBTP, and results from four datasets confirm its practical effectiveness in everyday service robot applications.
arXiv Detail & Related papers (2024-06-03T03:38:56Z) - Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models [31.509994889286183]
We introduce Language Agent Tree Search (LATS) -- the first general framework that synergizes the capabilities of language models (LMs) in reasoning, acting, and planning.
A key feature of our approach is the incorporation of an environment for external feedback, which offers a more deliberate and adaptive problem-solving mechanism.
LATS achieves state-of-the-art pass@1 accuracy (92.7%) for programming on HumanEval with GPT-4 and demonstrates gradient-free performance (average score of 75.9) comparable to gradient-based fine-tuning for web navigation on WebShop with GPT
arXiv Detail & Related papers (2023-10-06T17:55:11Z) - EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought [95.37585041654535]
Embodied AI is capable of planning and executing action sequences for robots to accomplish long-horizon tasks in physical environments.
In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI.
Experiments show the effectiveness of EmbodiedGPT on embodied tasks, including embodied planning, embodied control, visual captioning, and visual question answering.
arXiv Detail & Related papers (2023-05-24T11:04:30Z) - Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions
with Large Language Model [63.66204449776262]
Instruct2Act is a framework that maps multi-modal instructions to sequential actions for robotic manipulation tasks.
Our approach is adjustable and flexible in accommodating various instruction modalities and input types.
Our zero-shot method outperformed many state-of-the-art learning-based policies in several tasks.
arXiv Detail & Related papers (2023-05-18T17:59:49Z) - Learning to Solve Voxel Building Embodied Tasks from Pixels and Natural
Language Instructions [53.21504989297547]
We propose a new method that combines a language model and reinforcement learning for the task of building objects in a Minecraft-like environment.
Our method first generates a set of consistently achievable sub-goals from the instructions and then completes associated sub-tasks with a pre-trained RL policy.
arXiv Detail & Related papers (2022-11-01T18:30:42Z) - ProgPrompt: Generating Situated Robot Task Plans using Large Language
Models [68.57918965060787]
Large language models (LLMs) can be used to score potential next actions during task planning.
We present a programmatic LLM prompt structure that enables plan generation functional across situated environments.
arXiv Detail & Related papers (2022-09-22T20:29:49Z) - Active Inference and Behavior Trees for Reactive Action Planning and
Execution in Robotics [2.040132783511305]
We propose a hybrid combination of active inference and behavior trees (BTs) for reactive action planning and execution in dynamic environments.
The proposed approach allows to handle partially observable initial states and improves the robustness of classical BTs against unexpected contingencies.
arXiv Detail & Related papers (2020-11-19T10:24:41Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z) - Interpretable MTL from Heterogeneous Domains using Boosted Tree [8.095372074268685]
Multi-task learning (MTL) aims at improving the generalization performance of several related tasks.
In this paper, following the philosophy of boosted tree, we proposed a two-stage method.
Experiments on both benchmark and real-world datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2020-03-16T08:58:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.