Actor-Transformers for Group Activity Recognition
- URL: http://arxiv.org/abs/2003.12737v1
- Date: Sat, 28 Mar 2020 07:21:58 GMT
- Title: Actor-Transformers for Group Activity Recognition
- Authors: Kirill Gavrilyuk, Ryan Sanford, Mehrsan Javan, Cees G. M. Snoek
- Abstract summary: This paper strives to recognize individual actions and group activities from videos.
We propose an actor-transformer model able to learn and selectively extract information relevant for group activity recognition.
- Score: 43.60866347282833
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper strives to recognize individual actions and group activities from
videos. While existing solutions for this challenging problem explicitly model
spatial and temporal relationships based on location of individual actors, we
propose an actor-transformer model able to learn and selectively extract
information relevant for group activity recognition. We feed the transformer
with rich actor-specific static and dynamic representations expressed by
features from a 2D pose network and 3D CNN, respectively. We empirically study
different ways to combine these representations and show their complementary
benefits. Experiments show what is important to transform and how it should be
transformed. What is more, actor-transformers achieve state-of-the-art results
on two publicly available benchmarks for group activity recognition,
outperforming the previous best published results by a considerable margin.
Related papers
- Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy [12.257725479880458]
Action recognition has become one of the popular research topics in computer vision.
We propose a multi-view attention consistency method that computes the similarity between two attentions from two different views of the action videos.
Our approach applies the idea of Neural Radiance Field to implicitly render the features from novel views when training on single-view datasets.
arXiv Detail & Related papers (2024-05-02T14:43:21Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - CycleACR: Cycle Modeling of Actor-Context Relations for Video Action
Detection [67.90338302559672]
We propose to select actor-related scene context, rather than directly leverage raw video scenario, to improve relation modeling.
We develop a Cycle Actor-Context Relation network (CycleACR) where there is a symmetric graph that models the actor and context relations in a bidirectional form.
Compared to existing designs that focus on C2A-E, our CycleACR introduces A2C-R for a more effective relation modeling.
arXiv Detail & Related papers (2023-03-28T16:40:47Z) - SPARTAN: Self-supervised Spatiotemporal Transformers Approach to Group
Activity Recognition [47.3759947287782]
We propose a new, simple, and effective Self-supervised Spatio-temporal Transformers (TAN) approach to Group Activity Recognition (GAR) using unlabeled video data.
arXiv Detail & Related papers (2023-03-06T16:58:27Z) - Interaction Region Visual Transformer for Egocentric Action Anticipation [18.873728614415946]
We propose a novel way to represent human-object interactions for egocentric action anticipation.
We model interactions between hands and objects using Spatial Cross-Attention.
We then infuse contextual information using Trajectory Cross-Attention to obtain environment-refined interaction tokens.
Using these tokens, we construct an interaction-centric video representation for action anticipation.
arXiv Detail & Related papers (2022-11-25T15:00:51Z) - Dual-AI: Dual-path Actor Interaction Learning for Group Activity
Recognition [103.62363658053557]
We propose a Dual-path Actor Interaction (DualAI) framework, which flexibly arranges spatial and temporal transformers.
We also introduce a novel Multi-scale Actor Contrastive Loss (MAC-Loss) between two interactive paths of Dual-AI.
Our Dual-AI can boost group activity recognition by fusing distinct discriminative features of different actors.
arXiv Detail & Related papers (2022-04-05T12:17:40Z) - Audio-Adaptive Activity Recognition Across Video Domains [112.46638682143065]
We leverage activity sounds for domain adaptation as they have less variance across domains and can reliably indicate which activities are not happening.
We propose an audio-adaptive encoder and associated learning methods that discriminatively adjust the visual feature representation.
We also introduce the new task of actor shift, with a corresponding audio-visual dataset, to challenge our method with situations where the activity appearance changes dramatically.
arXiv Detail & Related papers (2022-03-27T08:15:20Z) - Self-Supervised Learning via multi-Transformation Classification for
Action Recognition [10.676377556393527]
We introduce a self-supervised video representation learning method based on the multi-transformation classification to efficiently classify human actions.
The representation of the video is learned in a self-supervised manner by classifying seven different transformations.
We have conducted the experiments on UCF101 and HMDB51 datasets together with C3D and 3D Resnet-18 as backbone networks.
arXiv Detail & Related papers (2021-02-20T16:11:26Z) - Adversarial Bipartite Graph Learning for Video Domain Adaptation [50.68420708387015]
Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area.
Recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations are not highly effective on the videos.
This paper proposes an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions.
arXiv Detail & Related papers (2020-07-31T03:48:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.