E^2TAD: An Energy-Efficient Tracking-based Action Detector
- URL: http://arxiv.org/abs/2204.04416v1
- Date: Sat, 9 Apr 2022 07:52:11 GMT
- Title: E^2TAD: An Energy-Efficient Tracking-based Action Detector
- Authors: Xin Hu, Zhenyu Wu, Hao-Yu Miao, Siqi Fan, Taiyu Long, Zhenyu Hu,
Pengcheng Pi, Yi Wu, Zhou Ren, Zhangyang Wang, Gang Hua
- Abstract summary: This paper presents a tracking-based solution to accurately and efficiently localize predefined key actions.
It won first place in the UAV-Video Track of 2021 Low-Power Computer Vision Challenge (LPCVC)
- Score: 78.90585878925545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video action detection (spatio-temporal action localization) is usually the
starting point for human-centric intelligent analysis of videos nowadays. It
has high practical impacts for many applications across robotics, security,
healthcare, etc. The two-stage paradigm of Faster R-CNN inspires a standard
paradigm of video action detection in object detection, i.e., firstly
generating person proposals and then classifying their actions. However, none
of the existing solutions could provide fine-grained action detection to the
"who-when-where-what" level. This paper presents a tracking-based solution to
accurately and efficiently localize predefined key actions spatially (by
predicting the associated target IDs and locations) and temporally (by
predicting the time in exact frame indices). This solution won first place in
the UAV-Video Track of 2021 Low-Power Computer Vision Challenge (LPCVC).
Related papers
- Practical Video Object Detection via Feature Selection and Aggregation [18.15061460125668]
Video object detection (VOD) needs to concern the high across-frame variation in object appearance, and the diverse deterioration in some frames.
Most of contemporary aggregation methods are tailored for two-stage detectors, suffering from high computational costs.
This study invents a very simple yet potent strategy of feature selection and aggregation, gaining significant accuracy at marginal computational expense.
arXiv Detail & Related papers (2024-07-29T02:12:11Z) - DeepLocalization: Using change point detection for Temporal Action Localization [2.4502578110136946]
We introduce DeepLocalization, an innovative framework devised for the real-time localization of actions tailored explicitly for monitoring driver behavior.
Our strategy employs a dual approach: leveraging Graph-Based Change-Point Detection for pinpointing actions in time alongside a Video Large Language Model (Video-LLM) for precisely categorizing activities.
arXiv Detail & Related papers (2024-04-18T15:25:59Z) - Detecting Every Object from Events [24.58024539462497]
We propose Detecting Every Object in Events (DEOE), an approach tailored for achieving high-speed, class-agnostic open-world object detection in event-based vision.
Our code is available at https://github.com/Hatins/DEOE.
arXiv Detail & Related papers (2024-04-08T08:20:53Z) - SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features [52.213656737672935]
SpikeMOT is an event-based multi-object tracker.
SpikeMOT uses spiking neural networks to extract sparsetemporal features from event streams associated with objects.
arXiv Detail & Related papers (2023-09-29T05:13:43Z) - Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework.
It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z) - Argus++: Robust Real-time Activity Detection for Unconstrained Video
Streams with Overlapping Cube Proposals [85.76513755331318]
Argus++ is a robust real-time activity detection system for analyzing unconstrained video streams.
The overall system is optimized for real-time processing on standalone consumer-level hardware.
arXiv Detail & Related papers (2022-01-14T03:35:22Z) - Single Run Action Detector over Video Stream -- A Privacy Preserving
Approach [13.247009439182769]
This paper presents Single Run Action Detector (S-RAD) which is a real-time privacy-preserving action detector.
Results on UCF-Sports and UR Fall dataset present comparable accuracy to State-of-the-Art approaches.
arXiv Detail & Related papers (2021-02-05T19:27:38Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z) - ZSTAD: Zero-Shot Temporal Activity Detection [107.63759089583382]
We propose a novel task setting called zero-shot temporal activity detection (ZSTAD), where activities that have never been seen in training can still be detected.
We design an end-to-end deep network based on R-C3D as the architecture for this solution.
Experiments on both the THUMOS14 and the Charades datasets show promising performance in terms of detecting unseen activities.
arXiv Detail & Related papers (2020-03-12T02:40:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.