CARTIER: Cartographic lAnguage Reasoning Targeted at Instruction
Execution for Robots
- URL: http://arxiv.org/abs/2307.11865v3
- Date: Thu, 1 Feb 2024 16:32:38 GMT
- Title: CARTIER: Cartographic lAnguage Reasoning Targeted at Instruction
Execution for Robots
- Authors: Dmitriy Rivkin, Nikhil Kakodkar, Francois Hogan, Bobak H. Baghi,
Gregory Dudek
- Abstract summary: This work explores the capacity of large language models to address problems at the intersection of spatial planning and natural language interfaces for navigation.
We focus on following complex instructions that are more akin to natural conversation than traditional explicit procedural directives typically seen in robotics.
We leverage the 3D simulator AI2Thor to create household query scenarios at scale, and augment it by adding complex language queries for 40 object types.
- Score: 9.393951367344894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work explores the capacity of large language models (LLMs) to address
problems at the intersection of spatial planning and natural language
interfaces for navigation. We focus on following complex instructions that are
more akin to natural conversation than traditional explicit procedural
directives typically seen in robotics. Unlike most prior work where navigation
directives are provided as simple imperative commands (e.g., "go to the
fridge"), we examine implicit directives obtained through conversational
interactions.We leverage the 3D simulator AI2Thor to create household query
scenarios at scale, and augment it by adding complex language queries for 40
object types. We demonstrate that a robot using our method CARTIER
(Cartographic lAnguage Reasoning Targeted at Instruction Execution for Robots)
can parse descriptive language queries up to 42% more reliably than existing
LLM-enabled methods by exploiting the ability of LLMs to interpret the user
interaction in the context of the objects in the scenario.
Related papers
- In-Context Learning Enables Robot Action Prediction in LLMs [52.285739178561705]
We introduce RoboPrompt, a framework that enables offthe-shelf text-only Large Language Models to directly predict robot actions.
Our approach firstally identifiess that capture important moments from an episode.
We extract end-effector actions as well as the estimated initial object poses, and both are converted into textual descriptions.
This enables an LLM to directly predict robot actions at test time.
arXiv Detail & Related papers (2024-10-16T17:56:49Z) - Integrating Disambiguation and User Preferences into Large Language Models for Robot Motion Planning [1.9912315834033756]
framework can interpret humans' navigation commands containing temporal elements and translate their natural language instructions into robot motion planning.
We propose methods to resolve the ambiguity in natural language instructions and capture user preferences.
arXiv Detail & Related papers (2024-04-22T19:38:37Z) - Verifiably Following Complex Robot Instructions with Foundation Models [16.564788361518197]
People want to flexibly express constraints, refer to arbitrary landmarks and verify when instructing robots.
We propose Language Instruction grounding for Motion Planning (LIM), an approach that enables robots to verifiably follow expressive and complex open-ended instructions.
LIM constructs a symbolic instruction representation that reveals the robot's alignment with an instructor's intended.
arXiv Detail & Related papers (2024-02-18T08:05:54Z) - Object-Centric Instruction Augmentation for Robotic Manipulation [29.491990994901666]
We introduce the textitObject-Centric Instruction Augmentation (OCI) framework to augment highly semantic and information-dense language instruction with position cues.
We utilize a Multi-modal Large Language Model (MLLM) to weave knowledge of object locations into natural language instruction.
We demonstrate that robotic manipulator imitation policies trained with our enhanced instructions outperform those relying solely on traditional language instructions.
arXiv Detail & Related papers (2024-01-05T13:54:45Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions
with Large Language Model [63.66204449776262]
Instruct2Act is a framework that maps multi-modal instructions to sequential actions for robotic manipulation tasks.
Our approach is adjustable and flexible in accommodating various instruction modalities and input types.
Our zero-shot method outperformed many state-of-the-art learning-based policies in several tasks.
arXiv Detail & Related papers (2023-05-18T17:59:49Z) - ProgPrompt: Generating Situated Robot Task Plans using Large Language
Models [68.57918965060787]
Large language models (LLMs) can be used to score potential next actions during task planning.
We present a programmatic LLM prompt structure that enables plan generation functional across situated environments.
arXiv Detail & Related papers (2022-09-22T20:29:49Z) - LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language,
Vision, and Action [76.71101507291473]
We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories.
We show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language modeling (GPT-3), without requiring any fine-tuning or language-annotated robot data.
arXiv Detail & Related papers (2022-07-10T10:41:50Z) - A Persistent Spatial Semantic Representation for High-level Natural
Language Instruction Execution [54.385344986265714]
We propose a persistent spatial semantic representation method to bridge the gap between language and robot actions.
We evaluate our approach on the ALFRED benchmark and achieve state-of-the-art results, despite completely avoiding the commonly used step-by-step instructions.
arXiv Detail & Related papers (2021-07-12T17:47:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.