Deep Learning-based Action Detection in Untrimmed Videos: A Survey
- URL: http://arxiv.org/abs/2110.00111v1
- Date: Thu, 30 Sep 2021 22:42:25 GMT
- Title: Deep Learning-based Action Detection in Untrimmed Videos: A Survey
- Authors: Elahe Vahdani and Yingli Tian
- Abstract summary: Most real-world videos are lengthy and untrimmed with sparse segments of interest.
The task of temporal activity detection in untrimmed videos aims to localize the temporal boundary of actions.
This paper provides an overview of deep learning-based algorithms to tackle temporal action detection in untrimmed videos.
- Score: 20.11911785578534
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding human behavior and activity facilitates advancement of numerous
real-world applications, and is critical for video analysis. Despite the
progress of action recognition algorithms in trimmed videos, the majority of
real-world videos are lengthy and untrimmed with sparse segments of interest.
The task of temporal activity detection in untrimmed videos aims to localize
the temporal boundary of actions and classify the action categories. Temporal
activity detection task has been investigated in full and limited supervision
settings depending on the availability of action annotations. This paper
provides an extensive overview of deep learning-based algorithms to tackle
temporal action detection in untrimmed videos with different supervision levels
including fully-supervised, weakly-supervised, unsupervised, self-supervised,
and semi-supervised. In addition, this paper also reviews advances in
spatio-temporal action detection where actions are localized in both temporal
and spatial dimensions. Moreover, the commonly used action detection benchmark
datasets and evaluation metrics are described, and the performance of the
state-of-the-art methods are compared. Finally, real-world applications of
temporal action detection in untrimmed videos and a set of future directions
are discussed.
Related papers
- Deep Learning for Video Anomaly Detection: A Review [52.74513211976795]
Video anomaly detection (VAD) aims to discover behaviors or events deviating from the normality in videos.
In the era of deep learning, a great variety of deep learning based methods are constantly emerging for the VAD task.
This review covers the spectrum of five different categories, namely, semi-supervised, weakly supervised, fully supervised, unsupervised and open-set supervised VAD.
arXiv Detail & Related papers (2024-09-09T07:31:16Z) - Video Action Detection: Analysing Limitations and Challenges [70.01260415234127]
We analyze existing datasets on video action detection and discuss their limitations.
We perform a biasness study which analyzes a key property differentiating videos from static images: the temporal aspect.
Such extreme experiments show existence of biases which have managed to creep into existing methods inspite of careful modeling.
arXiv Detail & Related papers (2022-04-17T00:42:14Z) - Argus++: Robust Real-time Activity Detection for Unconstrained Video
Streams with Overlapping Cube Proposals [85.76513755331318]
Argus++ is a robust real-time activity detection system for analyzing unconstrained video streams.
The overall system is optimized for real-time processing on standalone consumer-level hardware.
arXiv Detail & Related papers (2022-01-14T03:35:22Z) - Temporal Action Segmentation with High-level Complex Activity Labels [29.17792724210746]
We learn the action segments taking only the high-level activity labels as input.
We propose a novel action discovery framework that automatically discovers constituent actions in videos.
arXiv Detail & Related papers (2021-08-15T09:50:42Z) - Exploring Temporal Context and Human Movement Dynamics for Online Action
Detection in Videos [32.88517041655816]
Temporal context and human movement dynamics can be effectively employed for online action detection.
Our approach uses various state-of-the-art architectures and appropriately combines the extracted features in order to improve action detection.
arXiv Detail & Related papers (2021-06-26T08:34:19Z) - Intra- and Inter-Action Understanding via Temporal Action Parsing [118.32912239230272]
We construct a new dataset developed on sport videos with manual annotations of sub-actions, and conduct a study on temporal action parsing on top.
Our study shows that a sport activity usually consists of multiple sub-actions and that the awareness of such temporal structures is beneficial to action recognition.
We also investigate a number of temporal parsing methods, and thereon devise an improved method that is capable of mining sub-actions from training data without knowing the labels of them.
arXiv Detail & Related papers (2020-05-20T17:45:18Z) - Gabriella: An Online System for Real-Time Activity Detection in
Untrimmed Security Videos [72.50607929306058]
We propose a real-time online system to perform activity detection on untrimmed security videos.
The proposed method consists of three stages: tubelet extraction, activity classification and online tubelet merging.
We demonstrate the effectiveness of the proposed approach in terms of speed (100 fps) and performance with state-of-the-art results.
arXiv Detail & Related papers (2020-04-23T22:20:10Z) - ZSTAD: Zero-Shot Temporal Activity Detection [107.63759089583382]
We propose a novel task setting called zero-shot temporal activity detection (ZSTAD), where activities that have never been seen in training can still be detected.
We design an end-to-end deep network based on R-C3D as the architecture for this solution.
Experiments on both the THUMOS14 and the Charades datasets show promising performance in terms of detecting unseen activities.
arXiv Detail & Related papers (2020-03-12T02:40:36Z) - Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in
Untrimmed Sequences [25.299599341774204]
This paper proposes an approach for the unsupervised learning of actions in untrimmed video sequences based on a joint visual-temporal embedding space.
We show that the proposed approach is able to provide a meaningful visual and temporal embedding out of the visual cues present in contiguous video frames.
arXiv Detail & Related papers (2020-01-29T22:51:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.