tagE: Enabling an Embodied Agent to Understand Human Instructions
- URL: http://arxiv.org/abs/2310.15605v1
- Date: Tue, 24 Oct 2023 08:17:48 GMT
- Title: tagE: Enabling an Embodied Agent to Understand Human Instructions
- Authors: Chayan Sarkar and Avik Mitra and Pradip Pramanick and Tapas Nayak
- Abstract summary: We introduce a novel system known as task and argument grounding for Embodied agents (tagE)
At its core, our system employs an inventive neural network model designed to extract a series of tasks from complex task instructions expressed in natural language.
Our proposed model adopts an encoder-decoder framework enriched with nested decoding to effectively extract tasks and their corresponding arguments from these intricate instructions.
- Score: 3.943519623674811
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural language serves as the primary mode of communication when an
intelligent agent with a physical presence engages with human beings. While a
plethora of research focuses on natural language understanding (NLU),
encompassing endeavors such as sentiment analysis, intent prediction, question
answering, and summarization, the scope of NLU directed at situations
necessitating tangible actions by an embodied agent remains limited. The
inherent ambiguity and incompleteness inherent in natural language present
challenges for intelligent agents striving to decipher human intention. To
tackle this predicament head-on, we introduce a novel system known as task and
argument grounding for Embodied agents (tagE). At its core, our system employs
an inventive neural network model designed to extract a series of tasks from
complex task instructions expressed in natural language. Our proposed model
adopts an encoder-decoder framework enriched with nested decoding to
effectively extract tasks and their corresponding arguments from these
intricate instructions. These extracted tasks are then mapped (or grounded) to
the robot's established collection of skills, while the arguments find
grounding in objects present within the environment. To facilitate the training
and evaluation of our system, we have curated a dataset featuring complex
instructions. The results of our experiments underscore the prowess of our
approach, as it outperforms robust baseline models.
Related papers
- VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning [86.59849798539312]
We present Neuro-Symbolic Predicates, a first-order abstraction language that combines the strengths of symbolic and neural knowledge representations.
We show that our approach offers better sample complexity, stronger out-of-distribution generalization, and improved interpretability.
arXiv Detail & Related papers (2024-10-30T16:11:05Z) - Symbolic Learning Enables Self-Evolving Agents [55.625275970720374]
We introduce agent symbolic learning, a systematic framework that enables language agents to optimize themselves on their own.
Agent symbolic learning is designed to optimize the symbolic network within language agents by mimicking two fundamental algorithms in connectionist learning.
We conduct proof-of-concept experiments on both standard benchmarks and complex real-world tasks.
arXiv Detail & Related papers (2024-06-26T17:59:18Z) - Interpretable Robotic Manipulation from Language [11.207620790833271]
We introduce an explainable behavior cloning agent, named Ex-PERACT, specifically designed for manipulation tasks.
At the top level, the model is tasked with learning a discrete skill code, while at the bottom level, the policy network translates the problem into a voxelized grid and maps the discretized actions to voxel grids.
We evaluate our method across eight challenging manipulation tasks utilizing the RLBench benchmark, demonstrating that Ex-PERACT not only achieves competitive policy performance but also effectively bridges the gap between human instructions and machine execution in complex environments.
arXiv Detail & Related papers (2024-05-27T11:02:21Z) - Learning with Language-Guided State Abstractions [58.199148890064826]
Generalizable policy learning in high-dimensional observation spaces is facilitated by well-designed state representations.
Our method, LGA, uses a combination of natural language supervision and background knowledge from language models to automatically build state representations tailored to unseen tasks.
Experiments on simulated robotic tasks show that LGA yields state abstractions similar to those designed by humans, but in a fraction of the time.
arXiv Detail & Related papers (2024-02-28T23:57:04Z) - ThinkBot: Embodied Instruction Following with Thought Chain Reasoning [66.09880459084901]
Embodied Instruction Following (EIF) requires agents to complete human instruction by interacting objects in complicated surrounding environments.
We propose ThinkBot that reasons the thought chain in human instruction to recover the missing action descriptions.
Our ThinkBot outperforms the state-of-the-art EIF methods by a sizable margin in both success rate and execution efficiency.
arXiv Detail & Related papers (2023-12-12T08:30:09Z) - In-Context Analogical Reasoning with Pre-Trained Language Models [10.344428417489237]
We explore the use of intuitive language-based abstractions to support analogy in AI systems.
Specifically, we apply large pre-trained language models (PLMs) to visual Raven's Progressive Matrices ( RPM)
We find that PLMs exhibit a striking capacity for zero-shot relational reasoning, exceeding human performance and nearing supervised vision-based methods.
arXiv Detail & Related papers (2023-05-28T04:22:26Z) - Compositional Generalization in Grounded Language Learning via Induced
Model Sparsity [81.38804205212425]
We consider simple language-conditioned navigation problems in a grid world environment with disentangled observations.
We design an agent that encourages sparse correlations between words in the instruction and attributes of objects, composing them together to find the goal.
Our agent maintains a high level of performance on goals containing novel combinations of properties even when learning from a handful of demonstrations.
arXiv Detail & Related papers (2022-07-06T08:46:27Z) - One-Shot Learning from a Demonstration with Hierarchical Latent Language [43.140223608960554]
We introduce DescribeWorld, an environment designed to test this sort of generalization skill in grounded agents.
The agent observes a single task demonstration in a Minecraft-like grid world, and is then asked to carry out the same task in a new map.
We find that agents that perform text-based inference are better equipped for the challenge under a random split of tasks.
arXiv Detail & Related papers (2022-03-09T15:36:43Z) - Ask Your Humans: Using Human Instructions to Improve Generalization in
Reinforcement Learning [32.82030512053361]
We propose the use of step-by-step human demonstrations in the form of natural language instructions and action trajectories.
We find that human demonstrations help solve the most complex tasks.
We also find that incorporating natural language allows the model to generalize to unseen tasks in a zero-shot setting.
arXiv Detail & Related papers (2020-11-01T14:39:46Z) - Semantics-Aware Inferential Network for Natural Language Understanding [79.70497178043368]
We propose a Semantics-Aware Inferential Network (SAIN) to meet such a motivation.
Taking explicit contextualized semantics as a complementary input, the inferential module of SAIN enables a series of reasoning steps over semantic clues.
Our model achieves significant improvement on 11 tasks including machine reading comprehension and natural language inference.
arXiv Detail & Related papers (2020-04-28T07:24:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.