Related papers: A real-time algorithm for human action recognition in RGB and thermal video

A real-time algorithm for human action recognition in RGB and thermal video

URL: http://arxiv.org/abs/2304.01567v1
Date: Tue, 4 Apr 2023 06:44:13 GMT
Title: A real-time algorithm for human action recognition in RGB and thermal video
Authors: Hannes Fassold, Karlheinz Gutjahr, Anna Weber, Roland Perko
Abstract summary: We present a deep learning based algorithm for human action recognition for both RGB and thermal cameras. It is able to detect and track humans and recognize four basic actions in real-time on a notebook with a NVIDIA GPU.
Score: 1.5749416770494706
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Monitoring the movement and actions of humans in video in real-time is an important task. We present a deep learning based algorithm for human action recognition for both RGB and thermal cameras. It is able to detect and track humans and recognize four basic actions (standing, walking, running, lying) in real-time on a notebook with a NVIDIA GPU. For this, it combines state of the art components for object detection (Scaled YoloV4), optical flow (RAFT) and pose estimation (EvoSkeleton). Qualitative experiments on a set of tunnel videos show that the proposed algorithm works robustly for both RGB and thermal video.

Related papers

Human Activity Recognition using RGB-Event based Sensors: A Multi-modal Heat Conduction Model and A Benchmark Dataset [65.76480665062363]
Human Activity Recognition primarily relied on traditional RGB cameras to achieve high-performance activity recognition. Challenges in real-world scenarios, such as insufficient lighting and rapid movements, inevitably degrade the performance of RGB cameras. In this work, we rethink human activity recognition by combining the RGB and event cameras.
arXiv Detail & Related papers (2025-04-08T09:14:24Z)
EventTransAct: A video transformer-based framework for Event-camera based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos. In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame. In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z)
Deep Neural Networks in Video Human Action Recognition: A Review [21.00217656391331]
Video behavior recognition is one of the most foundational tasks of computer vision. Deep neural networks are built for recognizing pixel-level information such as images with RGB, RGB-D, or optical flow formats. In our article, the performance of deep neural networks surpassed most of the techniques in the feature learning and extraction tasks.
arXiv Detail & Related papers (2023-05-25T03:54:41Z)
High Speed Human Action Recognition using a Photonic Reservoir Computer [1.7403133838762443]
We introduce a new training method for the reservoir computer, based on "Timesteps Of Interest" We solve the task with high accuracy and speed, to the point of allowing for processing multiple video streams in real time.
arXiv Detail & Related papers (2023-05-24T16:04:42Z)
Deep Learning Computer Vision Algorithms for Real-time UAVs On-board Camera Image Processing [77.34726150561087]
This paper describes how advanced deep learning based computer vision algorithms are applied to enable real-time on-board sensor processing for small UAVs. All algorithms have been developed using state-of-the-art image processing methods based on deep neural networks.
arXiv Detail & Related papers (2022-11-02T11:10:42Z)
Differentiable Frequency-based Disentanglement for Aerial Video Action Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos. Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras. We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z)
RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB Video [76.86512780916827]
We present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera. In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN. We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline.
arXiv Detail & Related papers (2021-06-22T12:53:56Z)
Video Action Recognition Using spatio-temporal optical flow video frames [0.0]
There are many problems associated with recognizing human actions in videos. This paper focus on spatial and temporal pattern recognition for the classification of videos using Deep Neural Networks. The final recognition accuracy was about 94%.
arXiv Detail & Related papers (2021-02-05T19:46:49Z)
Faster and Accurate Compressed Video Action Recognition Straight from the Frequency Domain [1.9214041945441434]
Deep learning has been successfully used to learn powerful and interpretable features for recognizing human actions in videos. Most of the existing deep learning approaches have been designed for processing video information as RGB image sequences. We propose a deep neural network capable of learning straight from compressed video.
arXiv Detail & Related papers (2020-12-26T12:43:53Z)
Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics [74.6968179473212]
This paper proposes a novel pretext task to address the self-supervised learning problem. We compute a series of partitioning-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion. A neural network is built and trained to yield the statistical summaries given the video frames as inputs.
arXiv Detail & Related papers (2020-08-31T08:31:56Z)
A Real-time Action Representation with Temporal Encoding and Deep Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation. T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed. Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.