A Persistent Spatial Semantic Representation for High-level Natural
Language Instruction Execution
- URL: http://arxiv.org/abs/2107.05612v1
- Date: Mon, 12 Jul 2021 17:47:19 GMT
- Title: A Persistent Spatial Semantic Representation for High-level Natural
Language Instruction Execution
- Authors: Valts Blukis, Chris Paxton, Dieter Fox, Animesh Garg, Yoav Artzi
- Abstract summary: We propose a persistent spatial semantic representation method to bridge the gap between language and robot actions.
We evaluate our approach on the ALFRED benchmark and achieve state-of-the-art results, despite completely avoiding the commonly used step-by-step instructions.
- Score: 54.385344986265714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural language provides an accessible and expressive interface to specify
long-term tasks for robotic agents. However, non-experts are likely to specify
such tasks with high-level instructions, which abstract over specific robot
actions through several layers of abstraction. We propose that key to bridging
this gap between language and robot actions over long execution horizons are
persistent representations. We propose a persistent spatial semantic
representation method, and show how it enables building an agent that performs
hierarchical reasoning to effectively execute long-term tasks. We evaluate our
approach on the ALFRED benchmark and achieve state-of-the-art results, despite
completely avoiding the commonly used step-by-step instructions.
Related papers
- Embodied Instruction Following in Unknown Environments [66.60163202450954]
We propose an embodied instruction following (EIF) method for complex tasks in the unknown environment.
We build a hierarchical embodied instruction following framework including the high-level task planner and the low-level exploration controller.
For the task planner, we generate the feasible step-by-step plans for human goal accomplishment according to the task completion process and the known visual clues.
arXiv Detail & Related papers (2024-06-17T17:55:40Z) - Interpretable Robotic Manipulation from Language [11.207620790833271]
We introduce an explainable behavior cloning agent, named Ex-PERACT, specifically designed for manipulation tasks.
At the top level, the model is tasked with learning a discrete skill code, while at the bottom level, the policy network translates the problem into a voxelized grid and maps the discretized actions to voxel grids.
We evaluate our method across eight challenging manipulation tasks utilizing the RLBench benchmark, demonstrating that Ex-PERACT not only achieves competitive policy performance but also effectively bridges the gap between human instructions and machine execution in complex environments.
arXiv Detail & Related papers (2024-05-27T11:02:21Z) - Learning with Language-Guided State Abstractions [58.199148890064826]
Generalizable policy learning in high-dimensional observation spaces is facilitated by well-designed state representations.
Our method, LGA, uses a combination of natural language supervision and background knowledge from language models to automatically build state representations tailored to unseen tasks.
Experiments on simulated robotic tasks show that LGA yields state abstractions similar to those designed by humans, but in a fraction of the time.
arXiv Detail & Related papers (2024-02-28T23:57:04Z) - ThinkBot: Embodied Instruction Following with Thought Chain Reasoning [66.09880459084901]
Embodied Instruction Following (EIF) requires agents to complete human instruction by interacting objects in complicated surrounding environments.
We propose ThinkBot that reasons the thought chain in human instruction to recover the missing action descriptions.
Our ThinkBot outperforms the state-of-the-art EIF methods by a sizable margin in both success rate and execution efficiency.
arXiv Detail & Related papers (2023-12-12T08:30:09Z) - CARTIER: Cartographic lAnguage Reasoning Targeted at Instruction
Execution for Robots [9.393951367344894]
This work explores the capacity of large language models to address problems at the intersection of spatial planning and natural language interfaces for navigation.
We focus on following complex instructions that are more akin to natural conversation than traditional explicit procedural directives typically seen in robotics.
We leverage the 3D simulator AI2Thor to create household query scenarios at scale, and augment it by adding complex language queries for 40 object types.
arXiv Detail & Related papers (2023-07-21T19:09:37Z) - ProgPrompt: Generating Situated Robot Task Plans using Large Language
Models [68.57918965060787]
Large language models (LLMs) can be used to score potential next actions during task planning.
We present a programmatic LLM prompt structure that enables plan generation functional across situated environments.
arXiv Detail & Related papers (2022-09-22T20:29:49Z) - Skill Induction and Planning with Latent Language [94.55783888325165]
We formulate a generative model of action sequences in which goals generate sequences of high-level subtask descriptions.
We describe how to train this model using primarily unannotated demonstrations by parsing demonstrations into sequences of named high-level subtasks.
In trained models, the space of natural language commands indexes a library of skills; agents can use these skills to plan by generating high-level instruction sequences tailored to novel goals.
arXiv Detail & Related papers (2021-10-04T15:36:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.