ACDnet: An action detection network for real-time edge computing based
on flow-guided feature approximation and memory aggregation
- URL: http://arxiv.org/abs/2102.13493v1
- Date: Fri, 26 Feb 2021 14:06:31 GMT
- Title: ACDnet: An action detection network for real-time edge computing based
on flow-guided feature approximation and memory aggregation
- Authors: Yu Liu, Fan Yang and Dominique Ginhac
- Abstract summary: ACDnet is a compact action detection network targeting real-time edge computing.
It exploits the temporal coherence between successive video frames to approximate CNN features rather than naively extracting them.
It can robustly achieve detection well above real-time (75 FPS)
- Score: 8.013823319651395
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Interpreting human actions requires understanding the spatial and temporal
context of the scenes. State-of-the-art action detectors based on Convolutional
Neural Network (CNN) have demonstrated remarkable results by adopting
two-stream or 3D CNN architectures. However, these methods typically operate in
a non-real-time, ofline fashion due to system complexity to reason
spatio-temporal information. Consequently, their high computational cost is not
compliant with emerging real-world scenarios such as service robots or public
surveillance where detection needs to take place at resource-limited edge
devices. In this paper, we propose ACDnet, a compact action detection network
targeting real-time edge computing which addresses both efficiency and
accuracy. It intelligently exploits the temporal coherence between successive
video frames to approximate their CNN features rather than naively extracting
them. It also integrates memory feature aggregation from past video frames to
enhance current detection stability, implicitly modeling long temporal cues
over time. Experiments conducted on the public benchmark datasets UCF-24 and
JHMDB-21 demonstrate that ACDnet, when integrated with the SSD detector, can
robustly achieve detection well above real-time (75 FPS). At the same time, it
retains reasonable accuracy (70.92 and 49.53 frame mAP) compared to other
top-performing methods using far heavier configurations. Codes will be
available at https://github.com/dginhac/ACDnet.
Related papers
- TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - Local Compressed Video Stream Learning for Generic Event Boundary
Detection [25.37983456118522]
Event boundary detection aims to localize the generic, taxonomy-free event boundaries that segment videos into chunks.
Existing methods typically require video frames to be decoded before feeding into the network.
We propose a novel event boundary detection method that is fully end-to-end leveraging rich information in the compressed domain.
arXiv Detail & Related papers (2023-09-27T06:49:40Z) - Spatiotemporal Attention-based Semantic Compression for Real-time Video
Recognition [117.98023585449808]
We propose a temporal attention-based autoencoder (STAE) architecture to evaluate the importance of frames and pixels in each frame.
We develop a lightweight decoder that leverages a 3D-2D CNN combined to reconstruct missing information.
Experimental results show that ViT_STAE can compress the video dataset H51 by 104x with only 5% accuracy loss.
arXiv Detail & Related papers (2023-05-22T07:47:27Z) - DroneAttention: Sparse Weighted Temporal Attention for Drone-Camera
Based Activity Recognition [2.705905918316948]
Human activity recognition (HAR) using drone-mounted cameras has attracted considerable interest from the computer vision research community in recent years.
We propose a novel Sparse Weighted Temporal Attention (SWTA) module to utilize sparsely sampled video frames for obtaining global weighted temporal attention.
The proposed model has received an accuracy of 72.76%, 92.56%, and 78.86% on the respective datasets.
arXiv Detail & Related papers (2022-12-07T00:33:40Z) - Spatio-Temporal-based Context Fusion for Video Anomaly Detection [1.7710335706046505]
Video anomaly aims to discover abnormal events in videos, and the principal objects are target objects such as people and vehicles.
Most existing methods only focus on the temporal context, ignoring the role of the spatial context in anomaly detection.
This paper proposes a video anomaly detection algorithm based on target-temporal context fusion.
arXiv Detail & Related papers (2022-10-18T04:07:10Z) - Distortion-Aware Network Pruning and Feature Reuse for Real-time Video
Segmentation [49.17930380106643]
We propose a novel framework to speed up any architecture with skip-connections for real-time vision tasks.
Specifically, at the arrival of each frame, we transform the features from the previous frame to reuse them at specific spatial bins.
We then perform partial computation of the backbone network on the regions of the current frame that captures temporal differences between the current and previous frame.
arXiv Detail & Related papers (2022-06-20T07:20:02Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Parallel Detection for Efficient Video Analytics at the Edge [5.547133811014004]
Deep Neural Network (DNN) trained object detectors are widely deployed in mission-critical systems for real time video analytics at the edge.
A common performance requirement in mission-critical edge services is the near real-time latency of online object detection on edge devices.
This paper addresses these problems by exploiting multi-model multi-device detection parallelism for fast object detection in edge systems.
arXiv Detail & Related papers (2021-07-27T02:50:46Z) - Efficient Two-Stream Network for Violence Detection Using Separable
Convolutional LSTM [0.0]
We propose an efficient two-stream deep learning architecture leveraging Separable Convolutional LSTM (SepConvLSTM) and pre-trained MobileNet.
SepConvLSTM is constructed by replacing convolution operation at each gate of ConvLSTM with a depthwise separable convolution.
Our model outperforms the accuracy on the larger and more challenging RWF-2000 dataset by more than a 2% margin.
arXiv Detail & Related papers (2021-02-21T12:01:48Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z) - Depthwise Non-local Module for Fast Salient Object Detection Using a
Single Thread [136.2224792151324]
We propose a new deep learning algorithm for fast salient object detection.
The proposed algorithm achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.
arXiv Detail & Related papers (2020-01-22T15:23:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.