Context-Aware Planning and Environment-Aware Memory for Instruction
Following Embodied Agents
- URL: http://arxiv.org/abs/2308.07241v4
- Date: Wed, 13 Mar 2024 02:34:31 GMT
- Title: Context-Aware Planning and Environment-Aware Memory for Instruction
Following Embodied Agents
- Authors: Byeonghwi Kim, Jinyeon Kim, Yuyeong Kim, Cheolhong Min, Jonghyun Choi
- Abstract summary: We propose to consider the consequence of taken actions by CAPEAM in a sequence of actions.
We empirically show that the agent with the proposed CAPEAM achieves state-of-the-art performance in various metrics.
- Score: 15.902536100207852
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Accomplishing household tasks requires to plan step-by-step actions
considering the consequences of previous actions. However, the state-of-the-art
embodied agents often make mistakes in navigating the environment and
interacting with proper objects due to imperfect learning by imitating experts
or algorithmic planners without such knowledge. To improve both visual
navigation and object interaction, we propose to consider the consequence of
taken actions by CAPEAM (Context-Aware Planning and Environment-Aware Memory)
that incorporates semantic context (e.g., appropriate objects to interact with)
in a sequence of actions, and the changed spatial arrangement and states of
interacted objects (e.g., location that the object has been moved to) in
inferring the subsequent actions. We empirically show that the agent with the
proposed CAPEAM achieves state-of-the-art performance in various metrics using
a challenging interactive instruction following benchmark in both seen and
unseen environments by large margins (up to +10.70% in unseen env.).
Related papers
- Embodied Instruction Following in Unknown Environments [66.60163202450954]
We propose an embodied instruction following (EIF) method for complex tasks in the unknown environment.
We build a hierarchical embodied instruction following framework including the high-level task planner and the low-level exploration controller.
For the task planner, we generate the feasible step-by-step plans for human goal accomplishment according to the task completion process and the known visual clues.
arXiv Detail & Related papers (2024-06-17T17:55:40Z) - STAIR: Semantic-Targeted Active Implicit Reconstruction [23.884933841874908]
Actively reconstructing objects of interest, i.e. objects with specific semantic meanings, is relevant for a robot to perform downstream tasks.
We propose a novel framework for semantic-targeted active reconstruction using posed RGB-D measurements and 2D semantic labels as input.
arXiv Detail & Related papers (2024-03-17T14:42:05Z) - ThinkBot: Embodied Instruction Following with Thought Chain Reasoning [66.09880459084901]
Embodied Instruction Following (EIF) requires agents to complete human instruction by interacting objects in complicated surrounding environments.
We propose ThinkBot that reasons the thought chain in human instruction to recover the missing action descriptions.
Our ThinkBot outperforms the state-of-the-art EIF methods by a sizable margin in both success rate and execution efficiency.
arXiv Detail & Related papers (2023-12-12T08:30:09Z) - Localizing Active Objects from Egocentric Vision with Symbolic World
Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually.
We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions.
We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z) - Leveraging Next-Active Objects for Context-Aware Anticipation in
Egocentric Videos [31.620555223890626]
We study the problem of Short-Term Object interaction anticipation (STA)
We propose NAOGAT, a multi-modal end-to-end transformer network, to guide the model to predict context-aware future actions.
Our model outperforms existing methods on two separate datasets.
arXiv Detail & Related papers (2023-08-16T12:07:02Z) - Moving Forward by Moving Backward: Embedding Action Impact over Action
Semantics [57.671493865825255]
We propose to model the impact of actions on-the-fly using latent embeddings.
By combining these latent action embeddings with a novel, transformer-based, policy head, we design an Action Adaptive Policy.
We show that our AAP is highly performant even when faced, at inference-time with missing actions and, previously unseen, perturbed action space.
arXiv Detail & Related papers (2023-04-24T17:35:47Z) - INVIGORATE: Interactive Visual Grounding and Grasping in Clutter [56.00554240240515]
INVIGORATE is a robot system that interacts with human through natural language and grasps a specified object in clutter.
We train separate neural networks for object detection, for visual grounding, for question generation, and for OBR detection and grasping.
We build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules.
arXiv Detail & Related papers (2021-08-25T07:35:21Z) - Pushing it out of the Way: Interactive Visual Navigation [62.296686176988125]
We study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals.
We introduce the Neural Interaction Engine (NIE) to explicitly predict the change in the environment caused by the agent's actions.
By modeling the changes while planning, we find that agents exhibit significant improvements in their navigational capabilities.
arXiv Detail & Related papers (2021-04-28T22:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.