Agent-Environment Network for Temporal Action Proposal Generation
- URL: http://arxiv.org/abs/2107.08323v1
- Date: Sat, 17 Jul 2021 23:24:49 GMT
- Title: Agent-Environment Network for Temporal Action Proposal Generation
- Authors: Viet-Khoa Vo-Ho, Ngan Le, Kashu Yamazaki, Akihiro Sugimoto, Minh-Triet
Tran
- Abstract summary: Temporal action proposal generation aims at localizing temporal intervals containing human actions in untrimmed videos.
Based on the action definition that a human, known as an agent, interacts with the environment and performs an action that affects the environment, we propose a contextual Agent-Environment Network.
Our proposed contextual AEN involves (i) agent pathway, operating at a local level to tell about which humans/agents are acting and (ii) environment pathway operating at a global level to tell about how the agents interact with the environment.
- Score: 10.74737201306622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal action proposal generation is an essential and challenging task that
aims at localizing temporal intervals containing human actions in untrimmed
videos. Most of existing approaches are unable to follow the human cognitive
process of understanding the video context due to lack of attention mechanism
to express the concept of an action or an agent who performs the action or the
interaction between the agent and the environment. Based on the action
definition that a human, known as an agent, interacts with the environment and
performs an action that affects the environment, we propose a contextual
Agent-Environment Network. Our proposed contextual AEN involves (i) agent
pathway, operating at a local level to tell about which humans/agents are
acting and (ii) environment pathway operating at a global level to tell about
how the agents interact with the environment. Comprehensive evaluations on
20-action THUMOS-14 and 200-action ActivityNet-1.3 datasets with different
backbone networks, i.e C3D and SlowFast, show that our method robustly exhibits
outperformance against state-of-the-art methods regardless of the employed
backbone network.
Related papers
- AntEval: Evaluation of Social Interaction Competencies in LLM-Driven
Agents [65.16893197330589]
Large Language Models (LLMs) have demonstrated their ability to replicate human behaviors across a wide range of scenarios.
However, their capability in handling complex, multi-character social interactions has yet to be fully explored.
We introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods.
arXiv Detail & Related papers (2024-01-12T11:18:00Z) - Agent AI: Surveying the Horizons of Multimodal Interaction [83.18367129924997]
"Agent AI" is a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data.
We envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
arXiv Detail & Related papers (2024-01-07T19:11:18Z) - Interactive Autonomous Navigation with Internal State Inference and
Interactivity Estimation [58.21683603243387]
We propose three auxiliary tasks with relational-temporal reasoning and integrate them into the standard Deep Learning framework.
These auxiliary tasks provide additional supervision signals to infer the behavior patterns other interactive agents.
Our approach achieves robust and state-of-the-art performance in terms of standard evaluation metrics.
arXiv Detail & Related papers (2023-11-27T18:57:42Z) - Signifiers as a First-class Abstraction in Hypermedia Multi-Agent
Systems [0.6595290783361959]
We build on concepts and methods from Affordance Theory and Human-Computer Interaction to introduce signifiers as a first-class abstraction in Web-based Multi-Agent Systems.
We define a formal model for the contextual exposure of signifiers in hypermedia environments that aims to drive affordance exploitation.
arXiv Detail & Related papers (2023-02-14T10:54:46Z) - AOE-Net: Entities Interactions Modeling with Adaptive Attention
Mechanism for Temporal Action Proposals Generation [24.81870045216019]
Temporal action proposal generation (TAPG) is a challenging task, which requires localizing action intervals in an untrimmed video.
We propose to model these interactions with a multi-modal representation network, namely, Actors-Objects-Environment Interaction Network (AOE-Net)
Our AOE-Net consists of two modules, i.e., perception-based multi-modal representation (PMR) and boundary-matching module (BMM)
arXiv Detail & Related papers (2022-10-05T21:57:25Z) - Active Inference for Robotic Manipulation [30.692885688744507]
Active Inference is a theory that deals with partial observability in an explicit manner.
In this work, we apply Active Inference to a hard-to-explore simulated robotic manipulation tasks.
We show that the information-seeking behavior induced by Active Inference allows the agent to explore these challenging, sparse environments systematically.
arXiv Detail & Related papers (2022-06-01T12:19:38Z) - ABN: Agent-Aware Boundary Networks for Temporal Action Proposal
Generation [14.755186542366065]
Temporal action proposal generation (TAPG) aims to estimate temporal intervals of actions in untrimmed videos.
We propose a novel framework named Agent-Aware Boundary Network (ABN), which consists of two sub-networks.
We show that our proposed ABN robustly outperforms state-of-the-art methods regardless of the employed backbone network on TAPG.
arXiv Detail & Related papers (2022-03-16T21:06:34Z) - Information is Power: Intrinsic Control via Information Capture [110.3143711650806]
We argue that a compact and general learning objective is to minimize the entropy of the agent's state visitation estimated using a latent state-space model.
This objective induces an agent to both gather information about its environment, corresponding to reducing uncertainty, and to gain control over its environment, corresponding to reducing the unpredictability of future world states.
arXiv Detail & Related papers (2021-12-07T18:50:42Z) - AEI: Actors-Environment Interaction with Adaptive Attention for Temporal
Action Proposals Generation [15.360689782405057]
We propose Actor Environment Interaction (AEI) network to improve the video representation for temporal action proposals generation.
AEI contains two modules, i.e., perception-based visual representation (PVR) and boundary-matching module (BMM)
arXiv Detail & Related papers (2021-10-21T20:43:42Z) - Scene-aware Generative Network for Human Motion Synthesis [125.21079898942347]
We propose a new framework, with the interaction between the scene and the human motion taken into account.
Considering the uncertainty of human motion, we formulate this task as a generative task.
We derive a GAN based learning approach, with discriminators to enforce the compatibility between the human motion and the contextual scene.
arXiv Detail & Related papers (2021-05-31T09:05:50Z) - SPA: Verbal Interactions between Agents and Avatars in Shared Virtual
Environments using Propositional Planning [61.335252950832256]
Sense-Plan-Ask, or SPA, generates plausible verbal interactions between virtual human-like agents and user avatars in shared virtual environments.
We find that our algorithm creates a small runtime cost and enables agents to complete their goals more effectively than agents without the ability to leverage natural-language communication.
arXiv Detail & Related papers (2020-02-08T23:15:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.