WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos
- URL: http://arxiv.org/abs/2006.03732v2
- Date: Tue, 18 May 2021 18:19:25 GMT
- Title: WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos
- Authors: Mingfei Gao, Yingbo Zhou, Ran Xu, Richard Socher, Caiming Xiong
- Abstract summary: We propose a weakly supervised framework that can be trained using only video-class labels.
We show that our method largely outperforms weakly-supervised baselines.
When strongly supervised, our method obtains the state-of-the-art results in the tasks of both online per-frame action recognition and online detection of action start.
- Score: 124.72839555467944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online action detection in untrimmed videos aims to identify an action as it
happens, which makes it very important for real-time applications. Previous
methods rely on tedious annotations of temporal action boundaries for training,
which hinders the scalability of online action detection systems. We propose
WOAD, a weakly supervised framework that can be trained using only video-class
labels. WOAD contains two jointly-trained modules, i.e., temporal proposal
generator (TPG) and online action recognizer (OAR). Supervised by video-class
labels, TPG works offline and targets at accurately mining pseudo frame-level
labels for OAR. With the supervisory signals from TPG, OAR learns to conduct
action detection in an online fashion. Experimental results on THUMOS'14,
ActivityNet1.2 and ActivityNet1.3 show that our weakly-supervised method
largely outperforms weakly-supervised baselines and achieves comparable
performance to the previous strongly-supervised methods. Beyond that, WOAD is
flexible to leverage strong supervision when it is available. When strongly
supervised, our method obtains the state-of-the-art results in the tasks of
both online per-frame action recognition and online detection of action start.
Related papers
- ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos [35.371453530275666]
ActionSwitch is the first class-agnostic On-TAL framework capable of detecting overlapping actions.
By obviating the reliance on class information, ActionSwitch provides wider applicability to various situations.
arXiv Detail & Related papers (2024-07-17T20:07:05Z) - Open-Vocabulary Spatio-Temporal Action Detection [59.91046192096296]
Open-vocabulary-temporal action detection (OV-STAD) is an important fine-grained video understanding task.
OV-STAD requires training a model on a limited set of base classes with box and label supervision.
To better adapt the holistic VLM for the fine-grained action detection task, we carefully fine-tune it on the localized video region-text pairs.
arXiv Detail & Related papers (2024-05-17T14:52:47Z) - Semi-supervised Active Learning for Video Action Detection [8.110693267550346]
We develop a novel semi-supervised active learning approach which utilizes both labeled as well as unlabeled data.
We evaluate the proposed approach on three different benchmark datasets, UCF-24-101, JHMDB-21, and Youtube-VOS.
arXiv Detail & Related papers (2023-12-12T11:13:17Z) - Bridging Images and Videos: A Simple Learning Framework for Large
Vocabulary Video Object Detection [110.08925274049409]
We present a simple but effective learning framework that takes full advantage of all available training data to learn detection and tracking.
We show that consistent improvements of various large vocabulary trackers are capable, setting strong baseline results on the challenging TAO benchmarks.
arXiv Detail & Related papers (2022-12-20T10:33:03Z) - Two-Stream Consensus Network for Weakly-Supervised Temporal Action
Localization [94.37084866660238]
We present a Two-Stream Consensus Network (TSCN) to simultaneously address these challenges.
The proposed TSCN features an iterative refinement training method, where a frame-level pseudo ground truth is iteratively updated.
We propose a new attention normalization loss to encourage the predicted attention to act like a binary selection, and promote the precise localization of action instance boundaries.
arXiv Detail & Related papers (2020-10-22T10:53:32Z) - Gabriella: An Online System for Real-Time Activity Detection in
Untrimmed Security Videos [72.50607929306058]
We propose a real-time online system to perform activity detection on untrimmed security videos.
The proposed method consists of three stages: tubelet extraction, activity classification and online tubelet merging.
We demonstrate the effectiveness of the proposed approach in terms of speed (100 fps) and performance with state-of-the-art results.
arXiv Detail & Related papers (2020-04-23T22:20:10Z) - Two-Stream AMTnet for Action Detection [12.581710073789848]
We propose a new deep neural network architecture for online action detection, termed ream to the original appearance one in AMTnet.
Two-Stream AMTnet exhibits superior action detection performance over state-of-the-art approaches on the standard action detection benchmarks.
arXiv Detail & Related papers (2020-04-03T12:16:45Z) - A Novel Online Action Detection Framework from Untrimmed Video Streams [19.895434487276578]
We propose a novel online action detection framework that considers actions as a set of temporally ordered subclasses.
We augment our data by varying the lengths of videos to allow the proposed method to learn about the high intra-class variation in human actions.
arXiv Detail & Related papers (2020-03-17T14:11:24Z) - Delving into 3D Action Anticipation from Streaming Videos [99.0155538452263]
Action anticipation aims to recognize the action with a partial observation.
We introduce several complementary evaluation metrics and present a basic model based on frame-wise action classification.
We also explore multi-task learning strategies by incorporating auxiliary information from two aspects: the full action representation and the class-agnostic action label.
arXiv Detail & Related papers (2019-06-15T10:30:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.