ZSTAD: Zero-Shot Temporal Activity Detection
- URL: http://arxiv.org/abs/2003.05583v1
- Date: Thu, 12 Mar 2020 02:40:36 GMT
- Title: ZSTAD: Zero-Shot Temporal Activity Detection
- Authors: Lingling Zhang, Xiaojun Chang, Jun Liu, Minnan Luo, Sen Wang, Zongyuan
Ge, Alexander Hauptmann
- Abstract summary: We propose a novel task setting called zero-shot temporal activity detection (ZSTAD), where activities that have never been seen in training can still be detected.
We design an end-to-end deep network based on R-C3D as the architecture for this solution.
Experiments on both the THUMOS14 and the Charades datasets show promising performance in terms of detecting unseen activities.
- Score: 107.63759089583382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An integral part of video analysis and surveillance is temporal activity
detection, which means to simultaneously recognize and localize activities in
long untrimmed videos. Currently, the most effective methods of temporal
activity detection are based on deep learning, and they typically perform very
well with large scale annotated videos for training. However, these methods are
limited in real applications due to the unavailable videos about certain
activity classes and the time-consuming data annotation. To solve this
challenging problem, we propose a novel task setting called zero-shot temporal
activity detection (ZSTAD), where activities that have never been seen in
training can still be detected. We design an end-to-end deep network based on
R-C3D as the architecture for this solution. The proposed network is optimized
with an innovative loss function that considers the embeddings of activity
labels and their super-classes while learning the common semantics of seen and
unseen activities. Experiments on both the THUMOS14 and the Charades datasets
show promising performance in terms of detecting unseen activities.
Related papers
- Boundary-Denoising for Video Activity Localization [57.9973253014712]
We study the video activity localization problem from a denoising perspective.
Specifically, we propose an encoder-decoder model named DenoiseLoc.
Experiments show that DenoiseLoc advances %in several video activity understanding tasks.
arXiv Detail & Related papers (2023-04-06T08:48:01Z) - An Empirical Study of End-to-End Temporal Action Detection [82.64373812690127]
Temporal action detection (TAD) is an important yet challenging task in video understanding.
Rather than end-to-end learning, most existing methods adopt a head-only learning paradigm.
We validate the advantage of end-to-end learning over head-only learning and observe up to 11% performance improvement.
arXiv Detail & Related papers (2022-04-06T16:46:30Z) - Argus++: Robust Real-time Activity Detection for Unconstrained Video
Streams with Overlapping Cube Proposals [85.76513755331318]
Argus++ is a robust real-time activity detection system for analyzing unconstrained video streams.
The overall system is optimized for real-time processing on standalone consumer-level hardware.
arXiv Detail & Related papers (2022-01-14T03:35:22Z) - Deep Learning-based Action Detection in Untrimmed Videos: A Survey [20.11911785578534]
Most real-world videos are lengthy and untrimmed with sparse segments of interest.
The task of temporal activity detection in untrimmed videos aims to localize the temporal boundary of actions.
This paper provides an overview of deep learning-based algorithms to tackle temporal action detection in untrimmed videos.
arXiv Detail & Related papers (2021-09-30T22:42:25Z) - Adversarial Background-Aware Loss for Weakly-supervised Temporal
Activity Localization [40.517438760096056]
Temporally localizing activities within untrimmed videos has been extensively studied in recent years.
Despite recent advances, existing methods for weakly-supervised temporal activity localization struggle to recognize when an activity is not occurring.
arXiv Detail & Related papers (2020-07-13T19:33:24Z) - Gabriella: An Online System for Real-Time Activity Detection in
Untrimmed Security Videos [72.50607929306058]
We propose a real-time online system to perform activity detection on untrimmed security videos.
The proposed method consists of three stages: tubelet extraction, activity classification and online tubelet merging.
We demonstrate the effectiveness of the proposed approach in terms of speed (100 fps) and performance with state-of-the-art results.
arXiv Detail & Related papers (2020-04-23T22:20:10Z) - Revisiting Few-shot Activity Detection with Class Similarity Control [107.79338380065286]
We present a framework for few-shot temporal activity detection based on proposal regression.
Our model is end-to-end trainable, takes into account the frame rate differences between few-shot activities and untrimmed test videos, and can benefit from additional few-shot examples.
arXiv Detail & Related papers (2020-03-31T22:02:38Z) - 3D ResNet with Ranking Loss Function for Abnormal Activity Detection in
Videos [6.692686655277163]
This study is motivated by the recent state-of-art work of abnormal activity detection.
In the absence of temporal-annotations, such a model is prone to give a false alarm while detecting the abnormalities.
In this paper, we focus on the task of minimizing the false alarm rate while performing an abnormal activity detection task.
arXiv Detail & Related papers (2020-02-04T05:32:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.