Related papers: Interpretable Robot Control via Structured Behavior Trees and Large Language Models

Interpretable Robot Control via Structured Behavior Trees and Large Language Models

URL: http://arxiv.org/abs/2508.09621v2
Date: Wed, 08 Oct 2025 13:28:52 GMT
Title: Interpretable Robot Control via Structured Behavior Trees and Large Language Models
Authors: Ingrid Maéva Chekam, Ines Pastor-Martinez, Ali Tourani, Jose Andres Millan-Romera, Laura Ribeiro, Pedro Miguel Bastos Soares, Holger Voos, Jose Luis Sanchez-Lopez,
Abstract summary: This paper presents a novel framework that bridges natural language understanding and robotic execution.<n>The proposed approach is practical in real-world scenarios, with an average cognition-to-execution accuracy of approximately 94%.
Score: 0.14990005092937678
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As intelligent robots become more integrated into human environments, there is a growing need for intuitive and reliable Human-Robot Interaction (HRI) interfaces that are adaptable and more natural to interact with. Traditional robot control methods often require users to adapt to interfaces or memorize predefined commands, limiting usability in dynamic, unstructured environments. This paper presents a novel framework that bridges natural language understanding and robotic execution by combining Large Language Models (LLMs) with Behavior Trees. This integration enables robots to interpret natural language instructions given by users and translate them into executable actions by activating domain-specific plugins. The system supports scalable and modular integration, with a primary focus on perception-based functionalities, such as person tracking and hand gesture recognition. To evaluate the system, a series of real-world experiments was conducted across diverse environments. Experimental results demonstrate that the proposed approach is practical in real-world scenarios, with an average cognition-to-execution accuracy of approximately 94%, making a significant contribution to HRI systems and robots. The complete source code of the framework is publicly available at https://github.com/snt-arg/robot_suite.

Related papers

An Approach to Combining Video and Speech with Large Language Models in Human-Robot Interaction [0.0]
This work presents a novel HRI framework that combines advanced vision-language models, speech processing, and fuzzy logic.<n>The proposed system integrates Florence-2 for object detection, Llama 3.1 for natural language understanding, and Whisper for speech recognition.<n> Experimental evaluations conducted on consumer-grade hardware demonstrate a command execution accuracy of 75%.
arXiv Detail & Related papers (2026-02-23T09:05:15Z)
Commanding Humanoid by Free-form Language: A Large Language Action Model with Unified Motion Vocabulary [59.98573566227095]
We introduce Humanoid-LLA, a Large Language Action Model that maps expressive language commands to physically executable whole-body actions for humanoid robots.<n>Our approach integrates three core components: a unified motion vocabulary that aligns human and humanoid motion primitives into a shared discrete space; a vocabulary-directed controller distilled from a privileged policy to ensure physical feasibility; and a physics-informed fine-tuning stage using reinforcement learning with dynamics-aware rewards to enhance robustness and stability.
arXiv Detail & Related papers (2025-11-28T08:11:24Z)
$π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z)
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis [102.1876259853457]
We propose a tree-structured multimodal code generation framework for generalized robotic behavior synthesis, termed RoboCodeX. RoboCodeX decomposes high-level human instructions into multiple object-centric manipulation units consisting of physical preferences such as affordance and safety constraints. To further enhance the capability to map conceptual and perceptual understanding into control commands, a specialized multimodal reasoning dataset is collected for pre-training and an iterative self-updating methodology is introduced for supervised fine-tuning.
arXiv Detail & Related papers (2024-02-25T15:31:43Z)
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation. We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language. We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z)
Exploring Large Language Models to Facilitate Variable Autonomy for Human-Robot Teaming [4.779196219827508]
We introduce a novel framework for a GPT-powered multi-robot testbed environment, based on a Unity Virtual Reality (VR) setting. This system allows users to interact with robot agents through natural language, each powered by individual GPT cores. A user study with 12 participants explores the effectiveness of GPT-4 and, more importantly, user strategies when being given the opportunity to converse in natural language within a multi-robot environment.
arXiv Detail & Related papers (2023-12-12T12:26:48Z)
Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models [23.945922720555146]
We propose a system to achieve incremental learning of complex behavior from natural interaction. We integrate the system in the robot cognitive architecture of the humanoid robot ARMAR-6.
arXiv Detail & Related papers (2023-09-08T13:29:05Z)
WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system. We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration. We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z)
"No, to the Right" -- Online Language Corrections for Robotic Manipulation via Shared Autonomy [70.45420918526926]
We present LILAC, a framework for incorporating and adapting to natural language corrections online during execution. Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot. We show that our corrections-aware approach obtains higher task completion rates, and is subjectively preferred by users.
arXiv Detail & Related papers (2023-01-06T15:03:27Z)
Reshaping Robot Trajectories Using Natural Language Commands: A Study of Multi-Modal Data Alignment Using Transformers [33.7939079214046]
We provide a flexible language-based interface for human-robot collaboration. We take advantage of recent advancements in the field of large language models to encode the user command. We train the model using imitation learning over a dataset containing robot trajectories modified by language commands.
arXiv Detail & Related papers (2022-03-25T01:36:56Z)
Language Understanding for Field and Service Robots in a Priori Unknown Environments [29.16936249846063]
This paper provides a novel learning framework that allows field and service robots to interpret and execute natural language instructions. We use language as a "sensor" -- inferring spatial, topological, and semantic information implicit in natural language utterances. We incorporate this distribution in a probabilistic language grounding model and infer a distribution over a symbolic representation of the robot's action space.
arXiv Detail & Related papers (2021-05-21T15:13:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.