Moving Forward by Moving Backward: Embedding Action Impact over Action
Semantics
- URL: http://arxiv.org/abs/2304.12289v1
- Date: Mon, 24 Apr 2023 17:35:47 GMT
- Title: Moving Forward by Moving Backward: Embedding Action Impact over Action
Semantics
- Authors: Kuo-Hao Zeng, Luca Weihs, Roozbeh Mottaghi, Ali Farhadi
- Abstract summary: We propose to model the impact of actions on-the-fly using latent embeddings.
By combining these latent action embeddings with a novel, transformer-based, policy head, we design an Action Adaptive Policy.
We show that our AAP is highly performant even when faced, at inference-time with missing actions and, previously unseen, perturbed action space.
- Score: 57.671493865825255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A common assumption when training embodied agents is that the impact of
taking an action is stable; for instance, executing the "move ahead" action
will always move the agent forward by a fixed distance, perhaps with some small
amount of actuator-induced noise. This assumption is limiting; an agent may
encounter settings that dramatically alter the impact of actions: a move ahead
action on a wet floor may send the agent twice as far as it expects and using
the same action with a broken wheel might transform the expected translation
into a rotation. Instead of relying that the impact of an action stably
reflects its pre-defined semantic meaning, we propose to model the impact of
actions on-the-fly using latent embeddings. By combining these latent action
embeddings with a novel, transformer-based, policy head, we design an Action
Adaptive Policy (AAP). We evaluate our AAP on two challenging visual navigation
tasks in the AI2-THOR and Habitat environments and show that our AAP is highly
performant even when faced, at inference-time with missing actions and,
previously unseen, perturbed action space. Moreover, we observe significant
improvement in robustness against these actions when evaluating in real-world
scenarios.
Related papers
- DynaSaur: Large Language Agents Beyond Predefined Actions [108.75187263724838]
Existing LLM agent systems typically select actions from a fixed and predefined set at every step.
We propose an LLM agent framework that enables the dynamic creation and composition of actions in an online manner.
Our experiments on the GAIA benchmark demonstrate that this framework offers significantly greater flexibility and outperforms previous methods.
arXiv Detail & Related papers (2024-11-04T02:08:59Z) - Cross-Embodied Affordance Transfer through Learning Affordance Equivalences [6.828097734917722]
We propose a deep neural network model that unifies objects, actions, and effects into a single latent vector in a common latent space that we call the affordance space.
Our model does not learn the behavior of individual objects acted upon by a single agent.
Affordance Equivalence facilitates not only action generalization over objects but also Cross Embodiment transfer linking actions of different robots.
arXiv Detail & Related papers (2024-04-24T05:07:36Z) - Context-Aware Planning and Environment-Aware Memory for Instruction
Following Embodied Agents [15.902536100207852]
We propose to consider the consequence of taken actions by CAPEAM in a sequence of actions.
We empirically show that the agent with the proposed CAPEAM achieves state-of-the-art performance in various metrics.
arXiv Detail & Related papers (2023-08-14T16:23:21Z) - Action Sensitivity Learning for Temporal Action Localization [35.65086250175736]
We propose an Action Sensitivity Learning framework (ASL) to tackle the task of temporal action localization.
We first introduce a lightweight Action Sensitivity Evaluator to learn the action sensitivity at the class level and instance level, respectively.
Based on the action sensitivity of each frame, we design an Action Sensitive Contrastive Loss to enhance features, where the action-aware frames are sampled as positive pairs to push away the action-irrelevant frames.
arXiv Detail & Related papers (2023-05-25T04:19:14Z) - Weakly-Supervised Temporal Action Localization with Bidirectional
Semantic Consistency Constraint [83.36913240873236]
Weakly Supervised Temporal Action localization (WTAL) aims to classify and localize temporal boundaries of actions for the video.
We propose a simple yet efficient method, named bidirectional semantic consistency constraint (Bi- SCC) to discriminate the positive actions from co-scene actions.
Experimental results show that our approach outperforms the state-of-the-art methods on THUMOS14 and ActivityNet.
arXiv Detail & Related papers (2023-04-25T07:20:33Z) - Leveraging Self-Supervised Training for Unintentional Action Recognition [82.19777933440143]
We seek to identify the points in videos where the actions transition from intentional to unintentional.
We propose a multi-stage framework that exploits inherent biases such as motion speed, motion direction, and order to recognize unintentional actions.
arXiv Detail & Related papers (2022-09-23T21:36:36Z) - Game-theoretic Objective Space Planning [4.989480853499916]
Understanding intent of other agents is crucial to deploying autonomous systems in adversarial multi-agent environments.
Current approaches either oversimplify the discretization of the action space of agents or fail to recognize the long-term effect of actions and become myopic.
We propose a novel dimension reduction method that encapsulates diverse agent behaviors while conserving the continuity of agent actions.
arXiv Detail & Related papers (2022-09-16T07:35:20Z) - Weakly-supervised Action Transition Learning for Stochastic Human Motion
Prediction [81.94175022575966]
We introduce the task of action-driven human motion prediction.
It aims to predict multiple plausible future motions given a sequence of action labels and a short motion history.
arXiv Detail & Related papers (2022-05-31T08:38:07Z) - How RL Agents Behave When Their Actions Are Modified [0.0]
Reinforcement learning in complex environments may require supervision to prevent the agent from attempting dangerous actions.
We present the Modified-Action Markov Decision Process, an extension of the MDP model that allows actions to differ from the policy.
arXiv Detail & Related papers (2021-02-15T18:10:03Z) - Instance-Aware Predictive Navigation in Multi-Agent Environments [93.15055834395304]
We propose an Instance-Aware Predictive Control (IPC) approach, which forecasts interactions between agents as well as future scene structures.
We adopt a novel multi-instance event prediction module to estimate the possible interaction among agents in the ego-centric view.
We design a sequential action sampling strategy to better leverage predicted states on both scene-level and instance-level.
arXiv Detail & Related papers (2021-01-14T22:21:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.