Related papers: Verifiably Following Complex Robot Instructions with Foundation Models

Verifiably Following Complex Robot Instructions with Foundation Models

URL: http://arxiv.org/abs/2402.11498v2
Date: Mon, 8 Jul 2024 16:38:57 GMT
Title: Verifiably Following Complex Robot Instructions with Foundation Models
Authors: Benedict Quartey, Eric Rosen, Stefanie Tellex, George Konidaris,
Abstract summary: People want to flexibly express constraints, refer to arbitrary landmarks and verify when instructing robots. We propose Language Instruction grounding for Motion Planning (LIM), an approach that enables robots to verifiably follow expressive and complex open-ended instructions. LIM constructs a symbolic instruction representation that reveals the robot's alignment with an instructor's intended.
Score: 16.564788361518197
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Enabling mobile robots to follow complex natural language instructions is an important yet challenging problem. People want to flexibly express constraints, refer to arbitrary landmarks and verify behavior when instructing robots. Conversely, robots must disambiguate human instructions into specifications and ground instruction referents in the real world. We propose Language Instruction grounding for Motion Planning (LIMP), an approach that enables robots to verifiably follow expressive and complex open-ended instructions in real-world environments without prebuilt semantic maps. LIMP constructs a symbolic instruction representation that reveals the robot's alignment with an instructor's intended motives and affords the synthesis of robot behaviors that are correct-by-construction. We perform a large scale evaluation and demonstrate our approach on 150 instructions in five real-world environments showing the generality of our approach and the ease of deployment in novel unstructured domains. In our experiments, LIMP performs comparably with state-of-the-art LLM task planners and LLM code-writing planners on standard open vocabulary tasks and additionally achieves 79\% success rate on complex spatiotemporal instructions while LLM and Code-writing planners both achieve 38\%. See supplementary materials and demo videos at https://robotlimp.github.io

Related papers

REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning? [12.490512012911635]
Linguists suggest that such vagueness frequently arises from referring expressions (REs)<n>This paper studies how such vagueness in REs within human instructions affects LLM-based robot task planning.<n>We propose the first robot task planning benchmark with vague REs (REI-Bench), where we discover that the vagueness of REs can severely degrade robot planning performance.
arXiv Detail & Related papers (2025-05-16T05:27:15Z)
RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation [90.81956345363355]
RoBridge is a hierarchical intelligent architecture for general robotic manipulation.<n>It consists of a high-level cognitive planner (HCP) based on a large-scale pre-trained vision-language model (VLM)<n>It unleashes the procedural skill of reinforcement learning, effectively bridging the gap between cognition and execution.
arXiv Detail & Related papers (2025-05-03T06:17:18Z)
Time is on my sight: scene graph filtering for dynamic environment perception in an LLM-driven robot [0.8515309662618664]
This paper presents a robot control architecture that addresses key challenges in human-robot interaction. The architecture uses Large Language Models to integrate diverse information sources, including natural language commands. The architecture enhances adaptability, task efficiency, and human-robot collaboration in dynamic environments.
arXiv Detail & Related papers (2024-11-22T15:58:26Z)
$π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z)
Body Transformer: Leveraging Robot Embodiment for Policy Learning [51.531793239586165]
Body Transformer (BoT) is an architecture that leverages the robot embodiment by providing an inductive bias that guides the learning process. We represent the robot body as a graph of sensors and actuators, and rely on masked attention to pool information throughout the architecture. The resulting architecture outperforms the vanilla transformer, as well as the classical multilayer perceptron, in terms of task completion, scaling properties, and computational efficiency.
arXiv Detail & Related papers (2024-08-12T17:31:28Z)
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation. We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language. We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z)
ThinkBot: Embodied Instruction Following with Thought Chain Reasoning [66.09880459084901]
Embodied Instruction Following (EIF) requires agents to complete human instruction by interacting objects in complicated surrounding environments. We propose ThinkBot that reasons the thought chain in human instruction to recover the missing action descriptions. Our ThinkBot outperforms the state-of-the-art EIF methods by a sizable margin in both success rate and execution efficiency.
arXiv Detail & Related papers (2023-12-12T08:30:09Z)
Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks. We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z)
Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models [23.945922720555146]
We propose a system to achieve incremental learning of complex behavior from natural interaction. We integrate the system in the robot cognitive architecture of the humanoid robot ARMAR-6.
arXiv Detail & Related papers (2023-09-08T13:29:05Z)
WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system. We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration. We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z)
CARTIER: Cartographic lAnguage Reasoning Targeted at Instruction Execution for Robots [9.393951367344894]
This work explores the capacity of large language models to address problems at the intersection of spatial planning and natural language interfaces for navigation. We focus on following complex instructions that are more akin to natural conversation than traditional explicit procedural directives typically seen in robotics. We leverage the 3D simulator AI2Thor to create household query scenarios at scale, and augment it by adding complex language queries for 40 object types.
arXiv Detail & Related papers (2023-07-21T19:09:37Z)
Language to Rewards for Robotic Skill Synthesis [37.21434094015743]
We introduce a new paradigm that harnesses large language models (LLMs) to define reward parameters that can be optimized and accomplish variety of robotic tasks. Using reward as the intermediate interface generated by LLMs, we can effectively bridge the gap between high-level language instructions or corrections to low-level robot actions.
arXiv Detail & Related papers (2023-06-14T17:27:10Z)
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model [63.66204449776262]
Instruct2Act is a framework that maps multi-modal instructions to sequential actions for robotic manipulation tasks. Our approach is adjustable and flexible in accommodating various instruction modalities and input types. Our zero-shot method outperformed many state-of-the-art learning-based policies in several tasks.
arXiv Detail & Related papers (2023-05-18T17:59:49Z)
SEAL: Semantic Frame Execution And Localization for Perceiving Afforded Robot Actions [5.522839151632667]
We extend the semantic frame representation for robot manipulation actions and introduce the problem of Semantic Frame Execution And Localization for Perceiving Afforded Robot Actions (SEAL) as a graphical model. For the SEAL problem, we describe our nonparametric Semantic Frame Mapping (SeFM) algorithm for maintaining belief over a finite set of semantic frames as the locations of actions afforded to the robot.
arXiv Detail & Related papers (2023-03-24T15:25:41Z)
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models [68.57918965060787]
Large language models (LLMs) can be used to score potential next actions during task planning. We present a programmatic LLM prompt structure that enables plan generation functional across situated environments.
arXiv Detail & Related papers (2022-09-22T20:29:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.