Gabriella: An Online System for Real-Time Activity Detection in
Untrimmed Security Videos
- URL: http://arxiv.org/abs/2004.11475v2
- Date: Tue, 19 May 2020 17:45:25 GMT
- Title: Gabriella: An Online System for Real-Time Activity Detection in
Untrimmed Security Videos
- Authors: Mamshad Nayeem Rizve, Ugur Demir, Praveen Tirupattur, Aayush Jung
Rana, Kevin Duarte, Ishan Dave, Yogesh Singh Rawat, Mubarak Shah
- Abstract summary: We propose a real-time online system to perform activity detection on untrimmed security videos.
The proposed method consists of three stages: tubelet extraction, activity classification and online tubelet merging.
We demonstrate the effectiveness of the proposed approach in terms of speed (100 fps) and performance with state-of-the-art results.
- Score: 72.50607929306058
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Activity detection in security videos is a difficult problem due to multiple
factors such as large field of view, presence of multiple activities, varying
scales and viewpoints, and its untrimmed nature. The existing research in
activity detection is mainly focused on datasets, such as UCF-101, JHMDB,
THUMOS, and AVA, which partially address these issues. The requirement of
processing the security videos in real-time makes this even more challenging.
In this work we propose Gabriella, a real-time online system to perform
activity detection on untrimmed security videos. The proposed method consists
of three stages: tubelet extraction, activity classification, and online
tubelet merging. For tubelet extraction, we propose a localization network
which takes a video clip as input and spatio-temporally detects potential
foreground regions at multiple scales to generate action tubelets. We propose a
novel Patch-Dice loss to handle large variations in actor size. Our online
processing of videos at a clip level drastically reduces the computation time
in detecting activities. The detected tubelets are assigned activity class
scores by the classification network and merged together using our proposed
Tubelet-Merge Action-Split (TMAS) algorithm to form the final action
detections. The TMAS algorithm efficiently connects the tubelets in an online
fashion to generate action detections which are robust against varying length
activities. We perform our experiments on the VIRAT and MEVA (Multiview
Extended Video with Activities) datasets and demonstrate the effectiveness of
the proposed approach in terms of speed (~100 fps) and performance with
state-of-the-art results. The code and models will be made publicly available.
Related papers
- Cutup and Detect: Human Fall Detection on Cutup Untrimmed Videos Using a
Large Foundational Video Understanding Model [0.0]
This work explores the performance of a large video understanding foundation model on the downstream task of human fall detection on untrimmed video.
A method for temporal action localization that relies on a simple cutup of untrimmed videos is demonstrated.
The results are promising for real-time application, and the falls are detected on video level with a state-of-the-art 0.96 F1 score on the HQFSD dataset.
arXiv Detail & Related papers (2024-01-29T16:37:00Z) - Semi-supervised Active Learning for Video Action Detection [8.110693267550346]
We develop a novel semi-supervised active learning approach which utilizes both labeled as well as unlabeled data.
We evaluate the proposed approach on three different benchmark datasets, UCF-24-101, JHMDB-21, and Youtube-VOS.
arXiv Detail & Related papers (2023-12-12T11:13:17Z) - Argus++: Robust Real-time Activity Detection for Unconstrained Video
Streams with Overlapping Cube Proposals [85.76513755331318]
Argus++ is a robust real-time activity detection system for analyzing unconstrained video streams.
The overall system is optimized for real-time processing on standalone consumer-level hardware.
arXiv Detail & Related papers (2022-01-14T03:35:22Z) - Deep Learning-based Action Detection in Untrimmed Videos: A Survey [20.11911785578534]
Most real-world videos are lengthy and untrimmed with sparse segments of interest.
The task of temporal activity detection in untrimmed videos aims to localize the temporal boundary of actions.
This paper provides an overview of deep learning-based algorithms to tackle temporal action detection in untrimmed videos.
arXiv Detail & Related papers (2021-09-30T22:42:25Z) - TinyVIRAT: Low-resolution Video Action Recognition [70.37277191524755]
In real-world surveillance environments, the actions in videos are captured at a wide range of resolutions.
We introduce a benchmark dataset, TinyVIRAT, which contains natural low-resolution activities.
We propose a novel method for recognizing tiny actions in videos which utilizes a progressive generative approach.
arXiv Detail & Related papers (2020-07-14T21:09:18Z) - WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos [124.72839555467944]
We propose a weakly supervised framework that can be trained using only video-class labels.
We show that our method largely outperforms weakly-supervised baselines.
When strongly supervised, our method obtains the state-of-the-art results in the tasks of both online per-frame action recognition and online detection of action start.
arXiv Detail & Related papers (2020-06-05T23:08:41Z) - Revisiting Few-shot Activity Detection with Class Similarity Control [107.79338380065286]
We present a framework for few-shot temporal activity detection based on proposal regression.
Our model is end-to-end trainable, takes into account the frame rate differences between few-shot activities and untrimmed test videos, and can benefit from additional few-shot examples.
arXiv Detail & Related papers (2020-03-31T22:02:38Z) - A Novel Online Action Detection Framework from Untrimmed Video Streams [19.895434487276578]
We propose a novel online action detection framework that considers actions as a set of temporally ordered subclasses.
We augment our data by varying the lengths of videos to allow the proposed method to learn about the high intra-class variation in human actions.
arXiv Detail & Related papers (2020-03-17T14:11:24Z) - ZSTAD: Zero-Shot Temporal Activity Detection [107.63759089583382]
We propose a novel task setting called zero-shot temporal activity detection (ZSTAD), where activities that have never been seen in training can still be detected.
We design an end-to-end deep network based on R-C3D as the architecture for this solution.
Experiments on both the THUMOS14 and the Charades datasets show promising performance in terms of detecting unseen activities.
arXiv Detail & Related papers (2020-03-12T02:40:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.