Pose is all you need: The pose only group activity recognition system
(POGARS)
- URL: http://arxiv.org/abs/2108.04186v1
- Date: Mon, 9 Aug 2021 17:16:04 GMT
- Title: Pose is all you need: The pose only group activity recognition system
(POGARS)
- Authors: Haritha Thilakarathne, Aiden Nibali, Zhen He, Stuart Morgan
- Abstract summary: We introduce a novel deep learning based group activity recognition approach called Pose Only Group Activity Recognition System (POGARS)
POGARS uses 1D CNNs to learn dynamics of individuals involved in group activity and forgo learning from pixel data.
Experimental results confirm that POGARS achieves highly competitive results compared to state-of-the-art methods on a widely used public volleyball dataset.
- Score: 7.876115370275732
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We introduce a novel deep learning based group activity recognition approach
called the Pose Only Group Activity Recognition System (POGARS), designed to
use only tracked poses of people to predict the performed group activity. In
contrast to existing approaches for group activity recognition, POGARS uses 1D
CNNs to learn spatiotemporal dynamics of individuals involved in a group
activity and forgo learning features from pixel data. The proposed model uses a
spatial and temporal attention mechanism to infer person-wise importance and
multi-task learning for simultaneously performing group and individual action
classification. Experimental results confirm that POGARS achieves highly
competitive results compared to state-of-the-art methods on a widely used
public volleyball dataset despite only using tracked pose as input. Further our
experiments show by using pose only as input, POGARS has better generalization
capabilities compared to methods that use RGB as input.
Related papers
- Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph [4.075741925017479]
Group Activity Recognition aims to understand collective activities from videos.
Existing solutions rely on the RGB modality, which encounters challenges such as background variations.
We design a panoramic graph that incorporates multi-person skeletons and objects to encapsulate group activity.
arXiv Detail & Related papers (2024-07-28T13:57:03Z) - AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition [51.24321348668037]
Panoramic Activity Recognition (PAR) aims to identify multi-granularity behaviors performed by multiple persons in panoramic scenes.
Previous methods rely on manually annotated detection boxes in training and inference, hindering further practical deployment.
We propose a novel Adapt-Focused bi-Propagating Prototype learning (AdaFPP) framework to jointly recognize individual, group, and global activities in panoramic activity scenes.
arXiv Detail & Related papers (2024-05-04T01:53:22Z) - Learning Group Activity Features Through Person Attribute Prediction [13.964739198311001]
Group Activity Feature (GAF) learning is proposed.
By learning the whole network in an end-to-end manner, the attributes of people in a group are trained.
arXiv Detail & Related papers (2024-03-05T08:19:44Z) - Group Activity Recognition using Unreliable Tracked Pose [8.592249538742527]
Group activity recognition in video is a complex task due to the need for a model to recognise the actions of all individuals in the video.
We introduce an innovative deep learning-based group activity recognition approach called Rendered Pose based Group Activity Recognition System (RePGARS)
arXiv Detail & Related papers (2024-01-06T17:36:13Z) - Towards More Practical Group Activity Detection: A New Benchmark and Model [61.39427407758131]
Group activity detection (GAD) is the task of identifying members of each group and classifying the activity of the group at the same time in a video.
We present a new dataset, dubbed Caf'e, which presents more practical scenarios and metrics.
We also propose a new GAD model that deals with an unknown number of groups and latent group members efficiently and effectively.
arXiv Detail & Related papers (2023-12-05T16:48:17Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - SoGAR: Self-supervised Spatiotemporal Attention-based Social Group Activity Recognition [45.419756454791674]
This paper introduces a novel approach to Social Group Activity (SoGAR) using Self-supervised Transformers.
Our objective ensures that features extracted from contrasting views were consistent across self-temporal domains.
Our proposed SoGAR method achieved state-of-the-art results on three group activity recognition benchmarks.
arXiv Detail & Related papers (2023-04-27T03:41:15Z) - DECOMPL: Decompositional Learning with Attention Pooling for Group
Activity Recognition from a Single Volleyball Image [3.6144103736375857]
Group Activity Recognition (GAR) aims to detect the activity performed by multiple actors in a scene.
We propose a novel GAR technique for volleyball videos, DECOMPL, which consists of two complementary branches.
In the visual branch, it extracts the features using attention pooling in a selective way.
In the coordinate branch, it considers the current configuration of the actors and extracts spatial information from the box coordinates.
arXiv Detail & Related papers (2023-03-11T16:30:51Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.