EdgeOAR: Real-time Online Action Recognition On Edge Devices
- URL: http://arxiv.org/abs/2412.01267v1
- Date: Mon, 02 Dec 2024 08:35:22 GMT
- Title: EdgeOAR: Real-time Online Action Recognition On Edge Devices
- Authors: Wei Luo, Deyu Zhang, Ying Tang, Fan Wu, Yaoxue Zhang,
- Abstract summary: This paper addresses the challenges of Online Action Recognition (OAR), a framework that involves instantaneous analysis and classification of behaviors in video streams.<n>To address this, we designed EdgeAR, a novel framework specifically designed for OAR on edge devices.<n>We show that our method EdgeOAR reduces latency by 99.23% and energy consumption by 99.28% compared to state-of-the-art (SOTA) method.
- Score: 18.424406745729364
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper addresses the challenges of Online Action Recognition (OAR), a framework that involves instantaneous analysis and classification of behaviors in video streams. OAR must operate under stringent latency constraints, making it an indispensable component for real-time feedback for edge computing. Existing methods, which typically rely on the processing of entire video clips, fall short in scenarios requiring immediate recognition. To address this, we designed EdgeOAR, a novel framework specifically designed for OAR on edge devices. EdgeOAR includes the Early Exit-oriented Task-specific Feature Enhancement Module (TFEM), which comprises lightweight submodules to optimize features in both temporal and spatial dimensions. We design an iterative training method to enable TFEM learning features from the beginning of the video. Additionally, EdgeOAR includes an Inverse Information Entropy (IIE) and Modality Consistency (MC)-driven fusion module to fuse features and make better exit decisions. This design overcomes the two main challenges: robust modeling of spatio-temporal action representations with limited initial frames in online video streams and balancing accuracy and efficiency on resource-constrained edge devices. Experiments show that on the UCF-101 dataset, our method EdgeOAR reduces latency by 99.23% and energy consumption by 99.28% compared to state-of-the-art (SOTA) method. And achieves an adequate accuracy on edge devices.
Related papers
- On Accelerating Edge AI: Optimizing Resource-Constrained Environments [1.7355861031903428]
Resource-constrained edge deployments demand AI solutions that balance high performance with stringent compute, memory, and energy limitations.
We present a comprehensive overview of the primary strategies for accelerating deep learning models under such constraints.
arXiv Detail & Related papers (2025-01-25T01:37:03Z) - Efficient Detection Framework Adaptation for Edge Computing: A Plug-and-play Neural Network Toolbox Enabling Edge Deployment [59.61554561979589]
Edge computing has emerged as a key paradigm for deploying deep learning-based object detection in time-sensitive scenarios.
Existing edge detection methods face challenges: difficulty balancing detection precision with lightweight models, limited adaptability, and insufficient real-world validation.
We propose the Edge Detection Toolbox (ED-TOOLBOX), which utilizes generalizable plug-and-play components to adapt object detection models for edge environments.
arXiv Detail & Related papers (2024-12-24T07:28:10Z) - AyE-Edge: Automated Deployment Space Search Empowering Accuracy yet Efficient Real-Time Object Detection on the Edge [43.53507577737864]
AyE-Edge is the first-of-this-kind development tool that explores automated device deployment space.
AyE-Edge excels in extensive real-world experiments conducted on a mobile device.
A remarkable 96.7% reduction in power consumption, compared to state-of-the-art (SOTA) competitors.
arXiv Detail & Related papers (2024-07-25T16:17:08Z) - Weak-to-Strong 3D Object Detection with X-Ray Distillation [75.47580744933724]
We propose a versatile technique that seamlessly integrates into any existing framework for 3D Object Detection.
X-Ray Distillation with Object-Complete Frames is suitable for both supervised and semi-supervised settings.
Our proposed methods surpass state-of-the-art in semi-supervised learning by 1-1.5 mAP.
arXiv Detail & Related papers (2024-03-31T13:09:06Z) - T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning [22.002220932086693]
We propose an effective pre-training strategy, namely Temporal Masked Auto-Encoders (T-MAE)
T-MAE takes as input temporally adjacent frames and learns temporal dependency.
Our T-MAE pre-training strategy alleviates its demand for annotated data.
arXiv Detail & Related papers (2023-12-15T21:30:49Z) - D$^2$ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition [60.84084172829169]
Adapting large pre-trained image models to few-shot action recognition has proven to be an effective strategy for learning robust feature extractors.
We present the Disentangled-and-Deformable Spatio-Temporal Adapter (D$2$ST-Adapter), which is a novel tuning framework well-suited for few-shot action recognition.
arXiv Detail & Related papers (2023-12-03T15:40:10Z) - Fast and Resource-Efficient Object Tracking on Edge Devices: A
Measurement Study [9.976630547252427]
Multi-object tracking (MOT) detects the moving objects and tracks their locations frame by frame as real scenes are being captured into a video.
This paper examines the performance issues and edge-specific optimization opportunities for object tracking.
We present several edge specific performance optimization strategies, collectively coined as EMO, to speed up the real time object tracking.
arXiv Detail & Related papers (2023-09-06T02:25:36Z) - LAMBO: Large AI Model Empowered Edge Intelligence [71.56135386994119]
Next-generation edge intelligence is anticipated to benefit various applications via offloading techniques.
Traditional offloading architectures face several issues, including heterogeneous constraints, partial perception, uncertain generalization, and lack of tractability.
We propose a Large AI Model-Based Offloading (LAMBO) framework with over one billion parameters for solving these problems.
arXiv Detail & Related papers (2023-08-29T07:25:42Z) - Implicit Temporal Modeling with Learnable Alignment for Video
Recognition [95.82093301212964]
We propose a novel Implicit Learnable Alignment (ILA) method, which minimizes the temporal modeling effort while achieving incredibly high performance.
ILA achieves a top-1 accuracy of 88.7% on Kinetics-400 with much fewer FLOPs compared with Swin-L and ViViT-H.
arXiv Detail & Related papers (2023-04-20T17:11:01Z) - Motion-aware Memory Network for Fast Video Salient Object Detection [15.967509480432266]
We design a space-time memory (STM)-based network, which extracts useful temporal information of the current frame from adjacent frames as the temporal branch of VSOD.
In the encoding stage, we generate high-level temporal features by using high-level features from the current and its adjacent frames.
In the decoding stage, we propose an effective fusion strategy for spatial and temporal branches.
The proposed model does not require optical flow or other preprocessing, and can reach a speed of nearly 100 FPS during inference.
arXiv Detail & Related papers (2022-08-01T15:56:19Z) - Fast Online Video Super-Resolution with Deformable Attention Pyramid [172.16491820970646]
Video super-resolution (VSR) has many applications that pose strict causal, real-time, and latency constraints, including video streaming and TV.
We propose a recurrent VSR architecture based on a deformable attention pyramid (DAP)
arXiv Detail & Related papers (2022-02-03T17:49:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.