Real-Time Human Action Recognition on Embedded Platforms
- URL: http://arxiv.org/abs/2409.05662v2
- Date: Wed, 11 Sep 2024 14:21:29 GMT
- Title: Real-Time Human Action Recognition on Embedded Platforms
- Authors: Ruiqi Wang, Zichen Wang, Peiqi Gao, Mingzhen Li, Jaehwan Jeong, Yihang Xu, Yejin Lee, Carolyn M. Baum, Lisa Tabor Connor, Chenyang Lu,
- Abstract summary: This work tackles the real-time performance challenges of video-based human action recognition (HAR) with four contributions.
One contribution is the development of RT-HARE, a real-time HAR system tailored for embedded platforms.
Experimental results on an Nvidia Jetson Xavier NX platform demonstrated that RT-HARE realizes real-time HAR at a video frame rate of 30 frames per second.
- Score: 10.905968330685393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With advancements in computer vision and deep learning, video-based human action recognition (HAR) has become practical. However, due to the complexity of the computation pipeline, running HAR on live video streams incurs excessive delays on embedded platforms. This work tackles the real-time performance challenges of HAR with four contributions: 1) an experimental study identifying a standard Optical Flow (OF) extraction technique as the latency bottleneck in a state-of-the-art HAR pipeline, 2) an exploration of the latency-accuracy tradeoff between the standard and deep learning approaches to OF extraction, which highlights the need for a novel, efficient motion feature extractor, 3) the design of Integrated Motion Feature Extractor (IMFE), a novel single-shot neural network architecture for motion feature extraction with drastic improvement in latency, 4) the development of RT-HARE, a real-time HAR system tailored for embedded platforms. Experimental results on an Nvidia Jetson Xavier NX platform demonstrated that RT-HARE realizes real-time HAR at a video frame rate of 30 frames per second while delivering high levels of recognition accuracy.
Related papers
- Faster than Fast: Accelerating Oriented FAST Feature Detection on Low-end Embedded GPUs [11.639825636679454]
This paper presents two methods to accelerate the Oriented FAST feature detection on low-end embedded GPU.<n>Experiments on a Jetson TX2 embedded GPU demonstrate an average speedup of over 7.3 times compared to widely used OpenCV with GPU support.
arXiv Detail & Related papers (2025-06-08T14:30:30Z) - Towards Low-Latency Event Stream-based Visual Object Tracking: A Slow-Fast Approach [32.91982063297922]
We propose a novel Slow-Fast Tracking paradigm that flexibly adapts to different operational requirements, termed SFTrack.<n>The proposed framework supports two complementary modes, i.e., a high-precision slow tracker for scenarios with sufficient computational resources, and an efficient fast tracker tailored for latency-aware, resource-constrained environments.<n>Our framework first performs graph-based representation learning from high-temporal-resolution event streams, and then integrates the learned graph-structured information into two FlashAttention-based vision backbones.
arXiv Detail & Related papers (2025-05-19T09:37:23Z) - Col-OLHTR: A Novel Framework for Multimodal Online Handwritten Text Recognition [82.88856416080331]
Online Handwritten Text Recognition (OLHTR) has gained considerable attention for its diverse range of applications.
Current approaches usually treat OLHTR as a sequence recognition task, employing either a single trajectory or image encoder, or multi-stream encoders.
We propose a Collaborative learning-based OLHTR framework, called Col-OLHTR, that learns multimodal features during training while maintaining a single-stream inference process.
arXiv Detail & Related papers (2025-02-10T02:12:24Z) - Energy-Efficient Spiking Recurrent Neural Network for Gesture Recognition on Embedded GPUs [1.37621344207686]
This research explores the deployment of a spiking recurrent neural network (SRNN) with liquid time constant neurons for gesture recognition.
We focus on the energy efficiency and computational efficacy of NVIDIA Jetson Nano embedded GPU platforms.
arXiv Detail & Related papers (2024-08-23T10:50:29Z) - Revisiting Learning-based Video Motion Magnification for Real-time
Processing [23.148430647367224]
Video motion magnification is a technique to capture and amplify subtle motion in a video that is invisible to the naked eye.
We introduce a real-time deep learning-based motion magnification model with4.2X fewer FLOPs and is 2.7X faster than the prior art.
arXiv Detail & Related papers (2024-03-04T09:57:08Z) - Spatiotemporal Attention-based Semantic Compression for Real-time Video
Recognition [117.98023585449808]
We propose a temporal attention-based autoencoder (STAE) architecture to evaluate the importance of frames and pixels in each frame.
We develop a lightweight decoder that leverages a 3D-2D CNN combined to reconstruct missing information.
Experimental results show that ViT_STAE can compress the video dataset H51 by 104x with only 5% accuracy loss.
arXiv Detail & Related papers (2023-05-22T07:47:27Z) - You Can Ground Earlier than See: An Effective and Efficient Pipeline for
Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query.
Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames.
We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z) - ETAD: A Unified Framework for Efficient Temporal Action Detection [70.21104995731085]
Untrimmed video understanding such as temporal action detection (TAD) often suffers from the pain of huge demand for computing resources.
We build a unified framework for efficient end-to-end temporal action detection (ETAD)
ETAD achieves state-of-the-art performance on both THUMOS-14 and ActivityNet-1.3.
arXiv Detail & Related papers (2022-05-14T21:16:21Z) - hARMS: A Hardware Acceleration Architecture for Real-Time Event-Based
Optical Flow [0.0]
Event-based vision sensors produce asynchronous event streams with high temporal resolution based on changes in the visual scene.
Existing solutions for calculating optical flow from event data fail to capture the true direction of motion due to the aperture problem.
We present a hardware realization of the fARMS algorithm allowing for real-time computation of true flow on low-power, embedded platforms.
arXiv Detail & Related papers (2021-12-13T16:27:17Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z) - A Real-time Action Representation with Temporal Encoding and Deep
Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation.
T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed.
Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z) - Faster than FAST: GPU-Accelerated Frontend for High-Speed VIO [46.20949184826173]
This work focuses on the applicability of efficient low-level, GPU hardware-specific instructions to improve on existing computer vision algorithms.
Especially non-maxima suppression and the subsequent feature selection are prominent contributors to the overall image processing latency.
arXiv Detail & Related papers (2020-03-30T14:16:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.