Related papers: CARTIER: Cartographic lAnguage Reasoning Targeted at Instruction Execution for Robots

CARTIER: Cartographic lAnguage Reasoning Targeted at Instruction Execution for Robots

URL: http://arxiv.org/abs/2307.11865v3
Date: Thu, 1 Feb 2024 16:32:38 GMT
Title: CARTIER: Cartographic lAnguage Reasoning Targeted at Instruction Execution for Robots
Authors: Dmitriy Rivkin, Nikhil Kakodkar, Francois Hogan, Bobak H. Baghi, Gregory Dudek
Abstract summary: This work explores the capacity of large language models to address problems at the intersection of spatial planning and natural language interfaces for navigation. We focus on following complex instructions that are more akin to natural conversation than traditional explicit procedural directives typically seen in robotics. We leverage the 3D simulator AI2Thor to create household query scenarios at scale, and augment it by adding complex language queries for 40 object types.
Score: 9.393951367344894
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work explores the capacity of large language models (LLMs) to address problems at the intersection of spatial planning and natural language interfaces for navigation. We focus on following complex instructions that are more akin to natural conversation than traditional explicit procedural directives typically seen in robotics. Unlike most prior work where navigation directives are provided as simple imperative commands (e.g., "go to the fridge"), we examine implicit directives obtained through conversational interactions.We leverage the 3D simulator AI2Thor to create household query scenarios at scale, and augment it by adding complex language queries for 40 object types. We demonstrate that a robot using our method CARTIER (Cartographic lAnguage Reasoning Targeted at Instruction Execution for Robots) can parse descriptive language queries up to 42% more reliably than existing LLM-enabled methods by exploiting the ability of LLMs to interpret the user interaction in the context of the objects in the scenario.

Related papers

LiLMaps: Learnable Implicit Language Maps [18.342569823885864]
We present an approach that enhances incremental implicit mapping through the integration of vision-language features. Specifically, we (i) propose a decoder optimization technique for implicit language maps which can be used when new objects appear on the scene, and (ii) address the problem of inconsistent vision-language predictions between different viewing positions.
arXiv Detail & Related papers (2025-01-06T16:04:56Z)
In-Context Learning Enables Robot Action Prediction in LLMs [52.285739178561705]
We introduce RoboPrompt, a framework that enables offthe-shelf text-only Large Language Models to directly predict robot actions. Our approach firstally identifiess that capture important moments from an episode. We extract end-effector actions as well as the estimated initial object poses, and both are converted into textual descriptions. This enables an LLM to directly predict robot actions at test time.
arXiv Detail & Related papers (2024-10-16T17:56:49Z)
Integrating Disambiguation and User Preferences into Large Language Models for Robot Motion Planning [1.9912315834033756]
framework can interpret humans' navigation commands containing temporal elements and translate their natural language instructions into robot motion planning. We propose methods to resolve the ambiguity in natural language instructions and capture user preferences.
arXiv Detail & Related papers (2024-04-22T19:38:37Z)
Verifiably Following Complex Robot Instructions with Foundation Models [16.564788361518197]
People want to flexibly express constraints, refer to arbitrary landmarks and verify when instructing robots. We propose Language Instruction grounding for Motion Planning (LIM), an approach that enables robots to verifiably follow expressive and complex open-ended instructions. LIM constructs a symbolic instruction representation that reveals the robot's alignment with an instructor's intended.
arXiv Detail & Related papers (2024-02-18T08:05:54Z)
Object-Centric Instruction Augmentation for Robotic Manipulation [29.491990994901666]
We introduce the textitObject-Centric Instruction Augmentation (OCI) framework to augment highly semantic and information-dense language instruction with position cues. We utilize a Multi-modal Large Language Model (MLLM) to weave knowledge of object locations into natural language instruction. We demonstrate that robotic manipulator imitation policies trained with our enhanced instructions outperform those relying solely on traditional language instructions.
arXiv Detail & Related papers (2024-01-05T13:54:45Z)
Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks. We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z)
WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system. We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration. We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z)
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model [63.66204449776262]
Instruct2Act is a framework that maps multi-modal instructions to sequential actions for robotic manipulation tasks. Our approach is adjustable and flexible in accommodating various instruction modalities and input types. Our zero-shot method outperformed many state-of-the-art learning-based policies in several tasks.
arXiv Detail & Related papers (2023-05-18T17:59:49Z)
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models [68.57918965060787]
Large language models (LLMs) can be used to score potential next actions during task planning. We present a programmatic LLM prompt structure that enables plan generation functional across situated environments.
arXiv Detail & Related papers (2022-09-22T20:29:49Z)
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action [76.71101507291473]
We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories. We show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language modeling (GPT-3), without requiring any fine-tuning or language-annotated robot data.
arXiv Detail & Related papers (2022-07-10T10:41:50Z)
A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution [54.385344986265714]
We propose a persistent spatial semantic representation method to bridge the gap between language and robot actions. We evaluate our approach on the ALFRED benchmark and achieve state-of-the-art results, despite completely avoiding the commonly used step-by-step instructions.
arXiv Detail & Related papers (2021-07-12T17:47:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.