Deep Learning Approaches for Human Action Recognition in Video Data
- URL: http://arxiv.org/abs/2403.06810v1
- Date: Mon, 11 Mar 2024 15:31:25 GMT
- Title: Deep Learning Approaches for Human Action Recognition in Video Data
- Authors: Yufei Xie
- Abstract summary: This study conducts an in-depth analysis of various deep learning models to address this challenge.
We focus on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Two-Stream ConvNets.
The results of this study underscore the potential of composite models in achieving robust human action recognition.
- Score: 0.8080830346931087
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Human action recognition in videos is a critical task with significant
implications for numerous applications, including surveillance, sports
analytics, and healthcare. The challenge lies in creating models that are both
precise in their recognition capabilities and efficient enough for practical
use. This study conducts an in-depth analysis of various deep learning models
to address this challenge. Utilizing a subset of the UCF101 Videos dataset, we
focus on Convolutional Neural Networks (CNNs), Recurrent Neural Networks
(RNNs), and Two-Stream ConvNets. The research reveals that while CNNs
effectively capture spatial features and RNNs encode temporal sequences,
Two-Stream ConvNets exhibit superior performance by integrating spatial and
temporal dimensions. These insights are distilled from the evaluation metrics
of accuracy, precision, recall, and F1-score. The results of this study
underscore the potential of composite models in achieving robust human action
recognition and suggest avenues for future research in optimizing these models
for real-world deployment.
Related papers
- How Effective are Self-Supervised Models for Contact Identification in Videos [6.527178779672975]
This work aims to employ eight different CNNs based video SSL models to identify instances of physical contact within video sequences specifically.
The Something-Something v2 (SSv2) and Epic-Kitchen (EK-100) datasets were chosen for evaluating these approaches.
arXiv Detail & Related papers (2024-08-01T12:08:20Z) - Efficient and Accurate Hyperspectral Image Demosaicing with Neural Network Architectures [3.386560551295746]
This study investigates the effectiveness of neural network architectures in hyperspectral image demosaicing.
We introduce a range of network models and modifications, and compare them with classical methods and existing reference network approaches.
Results indicate that our networks outperform or match reference models in both datasets demonstrating exceptional performance.
arXiv Detail & Related papers (2023-12-21T08:02:49Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Human activity recognition using deep learning approaches and single
frame cnn and convolutional lstm [0.0]
We explore two deep learning-based approaches, namely single frame Convolutional Neural Networks (CNNs) and convolutional Long Short-Term Memory to recognise human actions from videos.
The two models were trained and evaluated on a benchmark action recognition dataset, UCF50, and another dataset that was created for the experimentation.
Though both models exhibit good accuracies, the single frame CNN model outperforms the Convolutional LSTM model by having an accuracy of 99.8% with the UCF50 dataset.
arXiv Detail & Related papers (2023-04-18T01:33:29Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points.
The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains.
We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z) - Network Comparison Study of Deep Activation Feature Discriminability
with Novel Objects [0.5076419064097732]
State-of-the-art computer visions algorithms have incorporated Deep Neural Networks (DNN) in feature extracting roles, creating Deep Convolutional Activation Features (DeCAF)
This study analyzes the general discriminability of novel object visual appearances encoded into the DeCAF space of six of the leading visual recognition DNN architectures.
arXiv Detail & Related papers (2022-02-08T07:40:53Z) - Scene Understanding for Autonomous Driving [0.0]
We study the behaviour of different configurations of RetinaNet, Faster R-CNN and Mask R-CNN presented in Detectron2.
We observe a significant improvement in performance after fine-tuning these models on the datasets of interest.
We run inference in unusual situations using out of context datasets, and present interesting results.
arXiv Detail & Related papers (2021-05-11T09:50:05Z) - Continuous Emotion Recognition with Spatiotemporal Convolutional Neural
Networks [82.54695985117783]
We investigate the suitability of state-of-the-art deep learning architectures for continuous emotion recognition using long video sequences captured in-the-wild.
We have developed and evaluated convolutional recurrent neural networks combining 2D-CNNs and long short term-memory units, and inflated 3D-CNN models, which are built by inflating the weights of a pre-trained 2D-CNN model during fine-tuning.
arXiv Detail & Related papers (2020-11-18T13:42:05Z) - Rectified Linear Postsynaptic Potential Function for Backpropagation in
Deep Spiking Neural Networks [55.0627904986664]
Spiking Neural Networks (SNNs) usetemporal spike patterns to represent and transmit information, which is not only biologically realistic but also suitable for ultra-low-power event-driven neuromorphic implementation.
This paper investigates the contribution of spike timing dynamics to information encoding, synaptic plasticity and decision making, providing a new perspective to design of future DeepSNNs and neuromorphic hardware systems.
arXiv Detail & Related papers (2020-03-26T11:13:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.