Human Activity Recognition Using Cascaded Dual Attention CNN and
Bi-Directional GRU Framework
- URL: http://arxiv.org/abs/2208.05034v1
- Date: Tue, 9 Aug 2022 20:34:42 GMT
- Title: Human Activity Recognition Using Cascaded Dual Attention CNN and
Bi-Directional GRU Framework
- Authors: Hayat Ullah, Arslan Munir
- Abstract summary: Vision-based human activity recognition has emerged as one of the essential research areas in video analytics domain.
This paper presents a computationally efficient yet generic spatial-temporal cascaded framework that exploits the deep discriminative spatial and temporal features for human activity recognition.
The proposed framework attains an improvement in execution time up to 167 times in terms of frames per second as compared to most of the contemporary action recognition methods.
- Score: 3.3721926640077795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-based human activity recognition has emerged as one of the essential
research areas in video analytics domain. Over the last decade, numerous
advanced deep learning algorithms have been introduced to recognize complex
human actions from video streams. These deep learning algorithms have shown
impressive performance for the human activity recognition task. However, these
newly introduced methods either exclusively focus on model performance or the
effectiveness of these models in terms of computational efficiency and
robustness, resulting in a biased tradeoff in their proposals to deal with
challenging human activity recognition problem. To overcome the limitations of
contemporary deep learning models for human activity recognition, this paper
presents a computationally efficient yet generic spatial-temporal cascaded
framework that exploits the deep discriminative spatial and temporal features
for human activity recognition. For efficient representation of human actions,
we have proposed an efficient dual attentional convolutional neural network
(CNN) architecture that leverages a unified channel-spatial attention mechanism
to extract human-centric salient features in video frames. The dual
channel-spatial attention layers together with the convolutional layers learn
to be more attentive in the spatial receptive fields having objects over the
number of feature maps. The extracted discriminative salient features are then
forwarded to stacked bi-directional gated recurrent unit (Bi-GRU) for long-term
temporal modeling and recognition of human actions using both forward and
backward pass gradient learning. Extensive experiments are conducted, where the
obtained results show that the proposed framework attains an improvement in
execution time up to 167 times in terms of frames per second as compared to
most of the contemporary action recognition methods.
Related papers
- Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition [53.359383163184425]
We introduce a novel multimodality synergistic knowledge distillation scheme tailored for efficient single-eye motion recognition tasks.
This method allows a lightweight, unimodal student spiking neural network (SNN) to extract rich knowledge from an event-frame multimodal teacher network.
arXiv Detail & Related papers (2024-06-20T07:24:47Z) - Deep Learning Approaches for Human Action Recognition in Video Data [0.8080830346931087]
This study conducts an in-depth analysis of various deep learning models to address this challenge.
We focus on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Two-Stream ConvNets.
The results of this study underscore the potential of composite models in achieving robust human action recognition.
arXiv Detail & Related papers (2024-03-11T15:31:25Z) - Spatio-Temporal Branching for Motion Prediction using Motion Increments [55.68088298632865]
Human motion prediction (HMP) has emerged as a popular research topic due to its diverse applications.
Traditional methods rely on hand-crafted features and machine learning techniques.
We propose a noveltemporal-temporal branching network using incremental information for HMP.
arXiv Detail & Related papers (2023-08-02T12:04:28Z) - Skeleton-based Human Action Recognition via Convolutional Neural
Networks (CNN) [4.598337780022892]
Most state-of-the-art contributions in skeleton-based action recognition incorporate a Graph Neural Network (GCN) architecture for representing the human body and extracting features.
Our research demonstrates that Convolutional Neural Networks (CNNs) can attain comparable results to GCN, provided that the proper training techniques, augmentations, and augmentations are applied.
arXiv Detail & Related papers (2023-01-31T01:26:17Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - A Close Look into Human Activity Recognition Models using Deep Learning [0.0]
This paper surveys some state-of-the-art human activity recognition models based on deep learning architecture.
The analysis outlines how the models are implemented to maximize its effectivity and some of the potential limitations it faces.
arXiv Detail & Related papers (2022-04-26T19:43:21Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based
Action Recognition [49.163326827954656]
We propose a novel multi-granular-temporal graph network for skeleton-based action classification.
We develop a dual-head graph network consisting of two inter-leaved branches, which enables us to extract at least two-temporal resolutions.
We conduct extensive experiments on three large-scale datasets.
arXiv Detail & Related papers (2021-08-10T09:25:07Z) - Collaborative Distillation in the Parameter and Spectrum Domains for
Video Action Recognition [79.60708268515293]
This paper explores how to train small and efficient networks for action recognition.
We propose two distillation strategies in the frequency domain, namely the feature spectrum and parameter distribution distillations respectively.
Our method can achieve higher performance than state-of-the-art methods with the same backbone.
arXiv Detail & Related papers (2020-09-15T07:29:57Z) - Attention-Oriented Action Recognition for Real-Time Human-Robot
Interaction [11.285529781751984]
We propose an attention-oriented multi-level network framework to meet the need for real-time interaction.
Specifically, a Pre-Attention network is employed to roughly focus on the interactor in the scene at low resolution.
The other compact CNN receives the extracted skeleton sequence as input for action recognition.
arXiv Detail & Related papers (2020-07-02T12:41:28Z) - Simultaneous Learning from Human Pose and Object Cues for Real-Time
Activity Recognition [11.290467061493189]
We propose a novel approach to real-time human activity recognition, through simultaneously learning from observations of both human poses and objects involved in the human activity.
Our method outperforms previous methods and obtains real-time performance for human activity recognition with a processing speed of 104 Hz.
arXiv Detail & Related papers (2020-03-26T22:04:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.