Video Action Recognition Using spatio-temporal optical flow video frames
- URL: http://arxiv.org/abs/2103.05101v1
- Date: Fri, 5 Feb 2021 19:46:49 GMT
- Title: Video Action Recognition Using spatio-temporal optical flow video frames
- Authors: Aytekin Nebisoy and Saber Malekzadeh
- Abstract summary: There are many problems associated with recognizing human actions in videos.
This paper focus on spatial and temporal pattern recognition for the classification of videos using Deep Neural Networks.
The final recognition accuracy was about 94%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recognizing human actions based on videos has became one of the most popular
areas of research in computer vision in recent years. This area has many
applications such as surveillance, robotics, health care, video search and
human-computer interaction. There are many problems associated with recognizing
human actions in videos such as cluttered backgrounds, obstructions, viewpoints
variation, execution speed and camera movement. A large number of methods have
been proposed to solve the problems. This paper focus on spatial and temporal
pattern recognition for the classification of videos using Deep Neural
Networks. This model takes RGB images and Optical Flow as input data and
outputs an action class number. The final recognition accuracy was about 94%.
Related papers
- EasyVolcap: Accelerating Neural Volumetric Video Research [69.59671164891725]
Volumetric video is a technology that digitally records dynamic events such as artistic performances, sporting events, and remote conversations.
EasyVolcap is a Python & Pytorch library for unifying the process of multi-view data processing, 4D scene reconstruction, and efficient dynamic volumetric video rendering.
arXiv Detail & Related papers (2023-12-11T17:59:46Z) - Deep Neural Networks in Video Human Action Recognition: A Review [21.00217656391331]
Video behavior recognition is one of the most foundational tasks of computer vision.
Deep neural networks are built for recognizing pixel-level information such as images with RGB, RGB-D, or optical flow formats.
In our article, the performance of deep neural networks surpassed most of the techniques in the feature learning and extraction tasks.
arXiv Detail & Related papers (2023-05-25T03:54:41Z) - Deep Learning for Event-based Vision: A Comprehensive Survey and Benchmarks [55.81577205593956]
Event cameras are bio-inspired sensors that capture the per-pixel intensity changes asynchronously.
Deep learning (DL) has been brought to this emerging field and inspired active research endeavors in mining its potential.
arXiv Detail & Related papers (2023-02-17T14:19:28Z) - Application Of ADNN For Background Subtraction In Smart Surveillance
System [0.0]
We develop an intelligent video surveillance system that uses ADNN architecture for motion detection, trims the video with parts only containing motion, and performs anomaly detection on the trimmed video.
arXiv Detail & Related papers (2022-12-31T18:42:11Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - Video Manipulations Beyond Faces: A Dataset with Human-Machine Analysis [60.13902294276283]
We present VideoSham, a dataset consisting of 826 videos (413 real and 413 manipulated).
Many of the existing deepfake datasets focus exclusively on two types of facial manipulations -- swapping with a different subject's face or altering the existing face.
Our analysis shows that state-of-the-art manipulation detection algorithms only work for a few specific attacks and do not scale well on VideoSham.
arXiv Detail & Related papers (2022-07-26T17:39:04Z) - A Multi-viewpoint Outdoor Dataset for Human Action Recognition [3.522154868524807]
We present a multi-viewpoint outdoor action recognition dataset collected from YouTube and our own drone.
The dataset consists of 20 dynamic human action classes, 2324 video clips and 503086 frames.
The overall baseline action recognition accuracy is 74.0%.
arXiv Detail & Related papers (2021-10-07T14:50:43Z) - JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion
Retargeting [53.28477676794658]
unsupervised motion in videos has seen substantial advancements through the use of deep neural networks.
We introduce JOKR - a JOint Keypoint Representation that handles both the source and target videos, without requiring any object prior or data collection.
We evaluate our method both qualitatively and quantitatively, and demonstrate that our method handles various cross-domain scenarios, such as different animals, different flowers, and humans.
arXiv Detail & Related papers (2021-06-17T17:32:32Z) - Faster and Accurate Compressed Video Action Recognition Straight from
the Frequency Domain [1.9214041945441434]
Deep learning has been successfully used to learn powerful and interpretable features for recognizing human actions in videos.
Most of the existing deep learning approaches have been designed for processing video information as RGB image sequences.
We propose a deep neural network capable of learning straight from compressed video.
arXiv Detail & Related papers (2020-12-26T12:43:53Z) - A Comprehensive Study of Deep Video Action Recognition [35.7068977497202]
Video action recognition is one of the representative tasks for video understanding.
We provide a comprehensive survey of over 200 existing papers on deep learning for video action recognition.
arXiv Detail & Related papers (2020-12-11T18:54:08Z) - Toward Accurate Person-level Action Recognition in Videos of Crowded
Scenes [131.9067467127761]
We focus on improving the action recognition by fully-utilizing the information of scenes and collecting new data.
Specifically, we adopt a strong human detector to detect spatial location of each frame.
We then apply action recognition models to learn thetemporal information from video frames on both the HIE dataset and new data with diverse scenes from the internet.
arXiv Detail & Related papers (2020-10-16T13:08:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.