Deep compositional robotic planners that follow natural language
commands
- URL: http://arxiv.org/abs/2002.05201v2
- Date: Wed, 19 Feb 2020 16:21:46 GMT
- Title: Deep compositional robotic planners that follow natural language
commands
- Authors: Yen-Ling Kuo, Boris Katz, Andrei Barbu
- Abstract summary: We show how a sampling-based robotic planner can be augmented to learn to understand a sequence of natural language commands.
Our approach combines a deep network structured according to the parse of a complex command that includes objects, verbs, spatial relations, and attributes.
- Score: 21.481360281719006
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We demonstrate how a sampling-based robotic planner can be augmented to learn
to understand a sequence of natural language commands in a continuous
configuration space to move and manipulate objects. Our approach combines a
deep network structured according to the parse of a complex command that
includes objects, verbs, spatial relations, and attributes, with a
sampling-based planner, RRT. A recurrent hierarchical deep network controls how
the planner explores the environment, determines when a planned path is likely
to achieve a goal, and estimates the confidence of each move to trade off
exploitation and exploration between the network and the planner. Planners are
designed to have near-optimal behavior when information about the task is
missing, while networks learn to exploit observations which are available from
the environment, making the two naturally complementary. Combining the two
enables generalization to new maps, new kinds of obstacles, and more complex
sentences that do not occur in the training set. Little data is required to
train the model despite it jointly acquiring a CNN that extracts features from
the environment as it learns the meanings of words. The model provides a level
of interpretability through the use of attention maps allowing users to see its
reasoning steps despite being an end-to-end model. This end-to-end model allows
robots to learn to follow natural language commands in challenging continuous
environments.
Related papers
- Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.
Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.
We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - Context-Aware Command Understanding for Tabletop Scenarios [1.7082212774297747]
This paper presents a novel hybrid algorithm designed to interpret natural human commands in tabletop scenarios.
By integrating multiple sources of information, including speech, gestures, and scene context, the system extracts actionable instructions for a robot.
We discuss the strengths and limitations of the system, with particular focus on how it handles multimodal command interpretation.
arXiv Detail & Related papers (2024-10-08T20:46:39Z) - Embodied Instruction Following in Unknown Environments [66.60163202450954]
We propose an embodied instruction following (EIF) method for complex tasks in the unknown environment.
We build a hierarchical embodied instruction following framework including the high-level task planner and the low-level exploration controller.
For the task planner, we generate the feasible step-by-step plans for human goal accomplishment according to the task completion process and the known visual clues.
arXiv Detail & Related papers (2024-06-17T17:55:40Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - tagE: Enabling an Embodied Agent to Understand Human Instructions [3.943519623674811]
We introduce a novel system known as task and argument grounding for Embodied agents (tagE)
At its core, our system employs an inventive neural network model designed to extract a series of tasks from complex task instructions expressed in natural language.
Our proposed model adopts an encoder-decoder framework enriched with nested decoding to effectively extract tasks and their corresponding arguments from these intricate instructions.
arXiv Detail & Related papers (2023-10-24T08:17:48Z) - Navigation with Large Language Models: Semantic Guesswork as a Heuristic
for Planning [73.0990339667978]
Navigation in unfamiliar environments presents a major challenge for robots.
We use language models to bias exploration of novel real-world environments.
We evaluate LFG in challenging real-world environments and simulated benchmarks.
arXiv Detail & Related papers (2023-10-16T06:21:06Z) - A General Framework for Interpretable Neural Learning based on Local Information-Theoretic Goal Functions [1.5236380958983644]
We introduce 'infomorphic' neural networks to perform tasks from supervised, unsupervised and memory learning.
By leveraging the interpretable nature of the PID framework, infomorphic networks represent a valuable tool to advance our understanding of the intricate structure of local learning.
arXiv Detail & Related papers (2023-06-03T16:34:25Z) - PDSketch: Integrated Planning Domain Programming and Learning [86.07442931141637]
We present a new domain definition language, named PDSketch.
It allows users to flexibly define high-level structures in the transition models.
Details of the transition model will be filled in by trainable neural networks.
arXiv Detail & Related papers (2023-03-09T18:54:12Z) - Embodied Active Learning of Relational State Abstractions for Bilevel
Planning [6.1678491628787455]
To plan with predicates, the agent must be able to interpret them in continuous environment states.
We propose an embodied active learning paradigm where the agent learns predicate interpretations through online interaction with an expert.
We learn predicate interpretations as ensembles of neural networks and use their entropy to measure the informativeness of potential queries.
arXiv Detail & Related papers (2023-03-08T22:04:31Z) - Modeling Long-horizon Tasks as Sequential Interaction Landscapes [75.5824586200507]
We present a deep learning network that learns dependencies and transitions across subtasks solely from a set of demonstration videos.
We show that these symbols can be learned and predicted directly from image observations.
We evaluate our framework on two long horizon tasks: (1) block stacking of puzzle pieces being executed by humans, and (2) a robot manipulation task involving pick and place of objects and sliding a cabinet door with a 7-DoF robot arm.
arXiv Detail & Related papers (2020-06-08T18:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.