Deep Neural Networks in Video Human Action Recognition: A Review
- URL: http://arxiv.org/abs/2305.15692v1
- Date: Thu, 25 May 2023 03:54:41 GMT
- Title: Deep Neural Networks in Video Human Action Recognition: A Review
- Authors: Zihan Wang, Yang Yang, Zhi Liu, Yifan Zheng
- Abstract summary: Video behavior recognition is one of the most foundational tasks of computer vision.
Deep neural networks are built for recognizing pixel-level information such as images with RGB, RGB-D, or optical flow formats.
In our article, the performance of deep neural networks surpassed most of the techniques in the feature learning and extraction tasks.
- Score: 21.00217656391331
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Currently, video behavior recognition is one of the most foundational tasks
of computer vision. The 2D neural networks of deep learning are built for
recognizing pixel-level information such as images with RGB, RGB-D, or optical
flow formats, with the current increasingly wide usage of surveillance video
and more tasks related to human action recognition. There are increasing tasks
requiring temporal information for frames dependency analysis. The researchers
have widely studied video-based recognition rather than
image-based(pixel-based) only to extract more informative elements from
geometry tasks. Our current related research addresses multiple novel proposed
research works and compares their advantages and disadvantages between the
derived deep learning frameworks rather than machine learning frameworks. The
comparison happened between existing frameworks and datasets, which are video
format data only. Due to the specific properties of human actions and the
increasingly wide usage of deep neural networks, we collected all research
works within the last three years between 2020 to 2022. In our article, the
performance of deep neural networks surpassed most of the techniques in the
feature learning and extraction tasks, especially video action recognition.
Related papers
- NPF-200: A Multi-Modal Eye Fixation Dataset and Method for
Non-Photorealistic Videos [51.409547544747284]
NPF-200 is the first large-scale multi-modal dataset of purely non-photorealistic videos with eye fixations.
We conduct a series of analyses to gain deeper insights into this task.
We propose a universal frequency-aware multi-modal non-photorealistic saliency detection model called NPSNet.
arXiv Detail & Related papers (2023-08-23T14:25:22Z) - A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented,
Temporal and Depth-aware design [77.34726150561087]
We conduct a survey on the most relevant and recent advances in Deep Semantic in the context of vision for autonomous vehicles.
Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective.
arXiv Detail & Related papers (2023-03-08T01:29:55Z) - A large scale multi-view RGBD visual affordance learning dataset [4.3773754388936625]
We introduce a large scale multi-view RGBD visual affordance learning dataset.
This is the first ever and the largest multi-view RGBD visual affordance learning dataset.
Several state-of-the-art deep learning networks are evaluated each for affordance recognition and segmentation tasks.
arXiv Detail & Related papers (2022-03-26T14:31:35Z) - Neural Architecture Search for Dense Prediction Tasks in Computer Vision [74.9839082859151]
Deep learning has led to a rising demand for neural network architecture engineering.
neural architecture search (NAS) aims at automatically designing neural network architectures in a data-driven manner rather than manually.
NAS has become applicable to a much wider range of problems in computer vision.
arXiv Detail & Related papers (2022-02-15T08:06:50Z) - Dynamic Gesture Recognition [0.0]
It is possible to use machine learning to classify images and/or videos instead of the traditional computer vision algorithms.
The aim of this project is to builda symbiosis between a convolutional neural network (CNN) and a recurrent neural network (RNN)
arXiv Detail & Related papers (2021-09-20T09:45:29Z) - Video Generation from Text Employing Latent Path Construction for
Temporal Modeling [70.06508219998778]
Video generation is one of the most challenging tasks in Machine Learning and Computer Vision fields of study.
In this paper, we tackle the text to video generation problem, which is a conditional form of video generation.
We believe that video generation from natural language sentences will have an important impact on Artificial Intelligence.
arXiv Detail & Related papers (2021-07-29T06:28:20Z) - Video Action Recognition Using spatio-temporal optical flow video frames [0.0]
There are many problems associated with recognizing human actions in videos.
This paper focus on spatial and temporal pattern recognition for the classification of videos using Deep Neural Networks.
The final recognition accuracy was about 94%.
arXiv Detail & Related papers (2021-02-05T19:46:49Z) - A Framework for Fast Scalable BNN Inference using Googlenet and Transfer
Learning [0.0]
This thesis aims to achieve high accuracy in object detection with good real-time performance.
The binarized neural network has shown high performance in various vision tasks such as image classification, object detection, and semantic segmentation.
Results show that the accuracy of objects detected by the transfer learning method is more when compared to the existing methods.
arXiv Detail & Related papers (2021-01-04T06:16:52Z) - Faster and Accurate Compressed Video Action Recognition Straight from
the Frequency Domain [1.9214041945441434]
Deep learning has been successfully used to learn powerful and interpretable features for recognizing human actions in videos.
Most of the existing deep learning approaches have been designed for processing video information as RGB image sequences.
We propose a deep neural network capable of learning straight from compressed video.
arXiv Detail & Related papers (2020-12-26T12:43:53Z) - Continuous Emotion Recognition with Spatiotemporal Convolutional Neural
Networks [82.54695985117783]
We investigate the suitability of state-of-the-art deep learning architectures for continuous emotion recognition using long video sequences captured in-the-wild.
We have developed and evaluated convolutional recurrent neural networks combining 2D-CNNs and long short term-memory units, and inflated 3D-CNN models, which are built by inflating the weights of a pre-trained 2D-CNN model during fine-tuning.
arXiv Detail & Related papers (2020-11-18T13:42:05Z) - Image Segmentation Using Deep Learning: A Survey [58.37211170954998]
Image segmentation is a key topic in image processing and computer vision.
There has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models.
arXiv Detail & Related papers (2020-01-15T21:37:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.