Related papers: Human activity recognition using deep learning approaches and single frame cnn and convolutional lstm

Human activity recognition using deep learning approaches and single frame cnn and convolutional lstm

URL: http://arxiv.org/abs/2304.14499v1
Date: Tue, 18 Apr 2023 01:33:29 GMT
Title: Human activity recognition using deep learning approaches and single frame cnn and convolutional lstm
Authors: Sheryl Mathew, Annapoorani Subramanian, Pooja, Balamurugan MS, Manoj Kumar Rajagopal
Abstract summary: We explore two deep learning-based approaches, namely single frame Convolutional Neural Networks (CNNs) and convolutional Long Short-Term Memory to recognise human actions from videos. The two models were trained and evaluated on a benchmark action recognition dataset, UCF50, and another dataset that was created for the experimentation. Though both models exhibit good accuracies, the single frame CNN model outperforms the Convolutional LSTM model by having an accuracy of 99.8% with the UCF50 dataset.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human activity recognition is one of the most important tasks in computer vision and has proved useful in different fields such as healthcare, sports training and security. There are a number of approaches that have been explored to solve this task, some of them involving sensor data, and some involving video data. In this paper, we aim to explore two deep learning-based approaches, namely single frame Convolutional Neural Networks (CNNs) and convolutional Long Short-Term Memory to recognise human actions from videos. Using a convolutional neural networks-based method is advantageous as CNNs can extract features automatically and Long Short-Term Memory networks are great when it comes to working on sequence data such as video. The two models were trained and evaluated on a benchmark action recognition dataset, UCF50, and another dataset that was created for the experimentation. Though both models exhibit good accuracies, the single frame CNN model outperforms the Convolutional LSTM model by having an accuracy of 99.8% with the UCF50 dataset.

Related papers

An Efficient 3D Convolutional Neural Network with Channel-wise, Spatial-grouped, and Temporal Convolutions [3.798710743290466]
We introduce a simple and very efficient 3D convolutional neural network for video action recognition. We evaluate the performance and efficiency of our proposed network on several video action recognition datasets.
arXiv Detail & Related papers (2025-03-02T08:47:06Z)
Deep Learning Approaches for Human Action Recognition in Video Data [0.8080830346931087]
This study conducts an in-depth analysis of various deep learning models to address this challenge. We focus on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Two-Stream ConvNets. The results of this study underscore the potential of composite models in achieving robust human action recognition.
arXiv Detail & Related papers (2024-03-11T15:31:25Z)
Differentiable Frequency-based Disentanglement for Aerial Video Action Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos. Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras. We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z)
Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage. We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets. By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z)
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research. In this study, we focus on transferring knowledge for video classification tasks. We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z)
STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data. Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z)
One to Many: Adaptive Instrument Segmentation via Meta Learning and Dynamic Online Adaptation in Robotic Surgical Video [71.43912903508765]
MDAL is a dynamic online adaptive learning scheme for instrument segmentation in robot-assisted surgery. It learns the general knowledge of instruments and the fast adaptation ability through the video-specific meta-learning paradigm. It outperforms other state-of-the-art methods on two datasets.
arXiv Detail & Related papers (2021-03-24T05:02:18Z)
Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one. Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP. We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z)
Combining Deep Transfer Learning with Signal-image Encoding for Multi-Modal Mental Wellbeing Classification [2.513785998932353]
This paper proposes a framework to tackle the limitation in performing emotional state recognition on multiple multimodal datasets. We show that model performance when inferring real-world wellbeing rated on a 5-point Likert scale can be enhanced using our framework.
arXiv Detail & Related papers (2020-11-20T13:37:23Z)
Complex Human Action Recognition in Live Videos Using Hybrid FR-DL Method [1.027974860479791]
We address challenges of the preprocessing phase, by an automated selection of representative frames among the input sequences. We propose a hybrid technique using background subtraction and HOG, followed by application of a deep neural network and skeletal modelling method. We name our model as Feature Reduction & Deep Learning based action recognition method, or FR-DL in short.
arXiv Detail & Related papers (2020-07-06T15:12:50Z)
Emotion Recognition on large video dataset based on Convolutional Feature Extractor and Recurrent Neural Network [0.2855485723554975]
Our model combines convolutional neural network (CNN) with recurrent neural network (RNN) to predict dimensional emotions on video data. Experiments are performed on publicly available datasets including the largest modern Aff-Wild2 database.
arXiv Detail & Related papers (2020-06-19T14:54:13Z)
Exploiting the ConvLSTM: Human Action Recognition using Raw Depth Video-Based Recurrent Neural Networks [0.0]
We propose and compare two neural networks based on the convolutional long short-term memory unit, namely ConvLSTM. We show that the proposed models achieve competitive recognition accuracies with lower computational cost compared with state-of-the-art methods.
arXiv Detail & Related papers (2020-06-13T23:35:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.