ABN: Agent-Aware Boundary Networks for Temporal Action Proposal
Generation
- URL: http://arxiv.org/abs/2203.08942v1
- Date: Wed, 16 Mar 2022 21:06:34 GMT
- Title: ABN: Agent-Aware Boundary Networks for Temporal Action Proposal
Generation
- Authors: Khoa Vo, Kashu Yamazaki, Sang Truong, Minh-Triet Tran, Akihiro
Sugimoto, Ngan Le
- Abstract summary: Temporal action proposal generation (TAPG) aims to estimate temporal intervals of actions in untrimmed videos.
We propose a novel framework named Agent-Aware Boundary Network (ABN), which consists of two sub-networks.
We show that our proposed ABN robustly outperforms state-of-the-art methods regardless of the employed backbone network on TAPG.
- Score: 14.755186542366065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal action proposal generation (TAPG) aims to estimate temporal
intervals of actions in untrimmed videos, which is a challenging yet plays an
important role in many tasks of video analysis and understanding. Despite the
great achievement in TAPG, most existing works ignore the human perception of
interaction between agents and the surrounding environment by applying a deep
learning model as a black-box to the untrimmed videos to extract video visual
representation. Therefore, it is beneficial and potentially improve the
performance of TAPG if we can capture these interactions between agents and the
environment. In this paper, we propose a novel framework named Agent-Aware
Boundary Network (ABN), which consists of two sub-networks (i) an Agent-Aware
Representation Network to obtain both agent-agent and agents-environment
relationships in the video representation, and (ii) a Boundary Generation
Network to estimate the confidence score of temporal intervals. In the
Agent-Aware Representation Network, the interactions between agents are
expressed through local pathway, which operates at a local level to focus on
the motions of agents whereas the overall perception of the surroundings are
expressed through global pathway, which operates at a global level to perceive
the effects of agents-environment. Comprehensive evaluations on 20-action
THUMOS-14 and 200-action ActivityNet-1.3 datasets with different backbone
networks (i.e C3D, SlowFast and Two-Stream) show that our proposed ABN robustly
outperforms state-of-the-art methods regardless of the employed backbone
network on TAPG. We further examine the proposal quality by leveraging
proposals generated by our method onto temporal action detection (TAD)
frameworks and evaluate their detection performances. The source code can be
found in this URL https://github.com/vhvkhoa/TAPG-AgentEnvNetwork.git.
Related papers
- Interactive Autonomous Navigation with Internal State Inference and
Interactivity Estimation [58.21683603243387]
We propose three auxiliary tasks with relational-temporal reasoning and integrate them into the standard Deep Learning framework.
These auxiliary tasks provide additional supervision signals to infer the behavior patterns other interactive agents.
Our approach achieves robust and state-of-the-art performance in terms of standard evaluation metrics.
arXiv Detail & Related papers (2023-11-27T18:57:42Z) - Collaborative Multi-Agent Video Fast-Forwarding [30.843484383185473]
We develop two collaborative multi-agent video fast-forwarding frameworks in distributed and centralized settings.
In these frameworks, each individual agent can selectively process or skip video frames at adjustable paces based on multiple strategies.
We show that compared with other approaches in the literature, our frameworks achieve better coverage of important frames, while significantly reducing the number of frames processed at each agent.
arXiv Detail & Related papers (2023-05-27T20:12:19Z) - AOE-Net: Entities Interactions Modeling with Adaptive Attention
Mechanism for Temporal Action Proposals Generation [24.81870045216019]
Temporal action proposal generation (TAPG) is a challenging task, which requires localizing action intervals in an untrimmed video.
We propose to model these interactions with a multi-modal representation network, namely, Actors-Objects-Environment Interaction Network (AOE-Net)
Our AOE-Net consists of two modules, i.e., perception-based multi-modal representation (PMR) and boundary-matching module (BMM)
arXiv Detail & Related papers (2022-10-05T21:57:25Z) - Masked Transformer for Neighhourhood-aware Click-Through Rate Prediction [74.52904110197004]
We propose Neighbor-Interaction based CTR prediction, which put this task into a Heterogeneous Information Network (HIN) setting.
In order to enhance the representation of the local neighbourhood, we consider four types of topological interaction among the nodes.
We conduct comprehensive experiments on two real world datasets and the experimental results show that our proposed method outperforms state-of-the-art CTR models significantly.
arXiv Detail & Related papers (2022-01-25T12:44:23Z) - AEI: Actors-Environment Interaction with Adaptive Attention for Temporal
Action Proposals Generation [15.360689782405057]
We propose Actor Environment Interaction (AEI) network to improve the video representation for temporal action proposals generation.
AEI contains two modules, i.e., perception-based visual representation (PVR) and boundary-matching module (BMM)
arXiv Detail & Related papers (2021-10-21T20:43:42Z) - MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization [17.825845543579195]
We propose a new multi-agent actor-critic method called textitMulti-Agent Cooperative Recurrent Proximal Policy Optimization (MACRPO)
We use a recurrent layer in critic's network architecture and propose a new framework to use a meta-trajectory to train the recurrent layer.
We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces.
arXiv Detail & Related papers (2021-09-02T12:43:35Z) - Decoder Fusion RNN: Context and Interaction Aware Decoders for
Trajectory Prediction [53.473846742702854]
We propose a recurrent, attention-based approach for motion forecasting.
Decoder Fusion RNN (DF-RNN) is composed of a recurrent behavior encoder, an inter-agent multi-headed attention module, and a context-aware decoder.
We demonstrate the efficacy of our method by testing it on the Argoverse motion forecasting dataset and show its state-of-the-art performance on the public benchmark.
arXiv Detail & Related papers (2021-08-12T15:53:37Z) - Weakly-Supervised Spatio-Temporal Anomaly Detection in Surveillance
Video [128.41392860714635]
We introduce Weakly-Supervised Snoma-Temporally Detection (WSSTAD) in surveillance video.
WSSTAD aims to localize a-temporal tube (i.e. sequence of bounding boxes at consecutive times) that encloses abnormal event.
We propose a dual-branch network which takes as input proposals with multi-granularities in both spatial-temporal domains.
arXiv Detail & Related papers (2021-08-09T06:11:14Z) - Agent-Environment Network for Temporal Action Proposal Generation [10.74737201306622]
Temporal action proposal generation aims at localizing temporal intervals containing human actions in untrimmed videos.
Based on the action definition that a human, known as an agent, interacts with the environment and performs an action that affects the environment, we propose a contextual Agent-Environment Network.
Our proposed contextual AEN involves (i) agent pathway, operating at a local level to tell about which humans/agents are acting and (ii) environment pathway operating at a global level to tell about how the agents interact with the environment.
arXiv Detail & Related papers (2021-07-17T23:24:49Z) - Target-Aware Object Discovery and Association for Unsupervised Video
Multi-Object Segmentation [79.6596425920849]
This paper addresses the task of unsupervised video multi-object segmentation.
We introduce a novel approach for more accurate and efficient unseen-temporal segmentation.
We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
arXiv Detail & Related papers (2021-04-10T14:39:44Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.