Asynchronous Interaction Aggregation for Action Detection
- URL: http://arxiv.org/abs/2004.07485v1
- Date: Thu, 16 Apr 2020 07:03:20 GMT
- Title: Asynchronous Interaction Aggregation for Action Detection
- Authors: Jiajun Tang, Jin Xia, Xinzhi Mu, Bo Pang, Cewu Lu
- Abstract summary: We propose the Asynchronous Interaction Aggregation network (AIA) that leverages different interactions to boost action detection.
There are two key designs in it: one is the Interaction Aggregation structure (IA) adopting a uniform paradigm to model and integrate multiple types of interaction; the other is the Asynchronous Memory Update algorithm (AMU) that enables us to achieve better performance.
- Score: 43.34864954534389
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding interaction is an essential part of video action detection. We
propose the Asynchronous Interaction Aggregation network (AIA) that leverages
different interactions to boost action detection. There are two key designs in
it: one is the Interaction Aggregation structure (IA) adopting a uniform
paradigm to model and integrate multiple types of interaction; the other is the
Asynchronous Memory Update algorithm (AMU) that enables us to achieve better
performance by modeling very long-term interaction dynamically without huge
computation cost. We provide empirical evidence to show that our network can
gain notable accuracy from the integrative interactions and is easy to train
end-to-end. Our method reports the new state-of-the-art performance on AVA
dataset, with 3.7 mAP gain (12.6% relative improvement) on validation split
comparing to our strong baseline. The results on dataset UCF101-24 and
EPIC-Kitchens further illustrate the effectiveness of our approach. Source code
will be made public at: https://github.com/MVIG-SJTU/AlphAction .
Related papers
- DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Holistic Interaction Transformer Network for Action Detection [15.667833703317124]
"HIT" network is a comprehensive bi-modal framework that comprises an RGB stream and a pose stream.
Our method significantly outperforms previous approaches on the J-HMDB, UCF101-24, and MultiSports datasets.
arXiv Detail & Related papers (2022-10-23T10:19:37Z) - Spatial Parsing and Dynamic Temporal Pooling networks for Human-Object
Interaction detection [30.896749712316222]
This paper introduces the Spatial Parsing and Dynamic Temporal Pooling (SPDTP) network, which takes the entire video as atemporal graph with human and object nodes as input.
We achieve state-of-the-art performance on CAD-120 and Something-Else dataset.
arXiv Detail & Related papers (2022-06-07T07:26:06Z) - Reformulating HOI Detection as Adaptive Set Prediction [25.44630995307787]
We reformulate HOI detection as an adaptive set prediction problem.
We propose an Adaptive Set-based one-stage framework (AS-Net) with parallel instance and interaction branches.
Our method outperforms previous state-of-the-art methods without any extra human pose and language features.
arXiv Detail & Related papers (2021-03-10T10:40:33Z) - A Co-Interactive Transformer for Joint Slot Filling and Intent Detection [61.109486326954205]
Intent detection and slot filling are two main tasks for building a spoken language understanding (SLU) system.
Previous studies either model the two tasks separately or only consider the single information flow from intent to slot.
We propose a Co-Interactive Transformer to consider the cross-impact between the two tasks simultaneously.
arXiv Detail & Related papers (2020-10-08T10:16:52Z) - DecAug: Augmenting HOI Detection via Decomposition [54.65572599920679]
Current algorithms suffer from insufficient training samples and category imbalance within datasets.
We propose an efficient and effective data augmentation method called DecAug for HOI detection.
Experiments show that our method brings up to 3.3 mAP and 1.6 mAP improvements on V-COCO and HICODET dataset.
arXiv Detail & Related papers (2020-10-02T13:59:05Z) - Learning End-to-End Action Interaction by Paired-Embedding Data
Augmentation [10.857323240766428]
A new Interactive Action Translation (IAT) task aims to learn end-to-end action interaction from unlabeled interactive pairs.
We propose a Paired-Embedding (PE) method for effective and reliable data augmentation.
Experimental results on two datasets show impressive effects and broad application prospects of our method.
arXiv Detail & Related papers (2020-07-16T01:54:16Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.