Semi-Supervised Temporal Action Detection with Proposal-Free Masking
- URL: http://arxiv.org/abs/2207.07059v1
- Date: Thu, 14 Jul 2022 16:58:47 GMT
- Title: Semi-Supervised Temporal Action Detection with Proposal-Free Masking
- Authors: Sauradip Nag, Xiatian Zhu, Yi-Zhe Song and Tao Xiang
- Abstract summary: We propose a novel Semi-supervised Temporal action detection model based on PropOsal-free Temporal mask (SPOT)
SPOT outperforms state-of-the-art alternatives, often by a large margin.
- Score: 134.26292288193298
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing temporal action detection (TAD) methods rely on a large number of
training data with segment-level annotations. Collecting and annotating such a
training set is thus highly expensive and unscalable. Semi-supervised TAD
(SS-TAD) alleviates this problem by leveraging unlabeled videos freely
available at scale. However, SS-TAD is also a much more challenging problem
than supervised TAD, and consequently much under-studied. Prior SS-TAD methods
directly combine an existing proposal-based TAD method and a SSL method. Due to
their sequential localization (e.g, proposal generation) and classification
design, they are prone to proposal error propagation. To overcome this
limitation, in this work we propose a novel Semi-supervised Temporal action
detection model based on PropOsal-free Temporal mask (SPOT) with a parallel
localization (mask generation) and classification architecture. Such a novel
design effectively eliminates the dependence between localization and
classification by cutting off the route for error propagation in-between. We
further introduce an interaction mechanism between classification and
localization for prediction refinement, and a new pretext task for
self-supervised model pre-training. Extensive experiments on two standard
benchmarks show that our SPOT outperforms state-of-the-art alternatives, often
by a large margin. The PyTorch implementation of SPOT is available at
https://github.com/sauradip/SPOT
Related papers
- Temporal Action Localization with Enhanced Instant Discriminability [66.76095239972094]
Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.
We propose a one-stage framework named TriDet to resolve imprecise predictions of action boundaries by existing methods.
Experimental results demonstrate the robustness of TriDet and its state-of-the-art performance on multiple TAD datasets.
arXiv Detail & Related papers (2023-09-11T16:17:50Z) - Zero-Shot Temporal Action Detection via Vision-Language Prompting [134.26292288193298]
We propose a novel zero-Shot Temporal Action detection model via Vision-LanguagE prompting (STALE)
Our model significantly outperforms state-of-the-art alternatives.
Our model also yields superior results on supervised TAD over recent strong competitors.
arXiv Detail & Related papers (2022-07-17T13:59:46Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - SRF-Net: Selective Receptive Field Network for Anchor-Free Temporal
Action Detection [32.159784061961886]
Temporal action detection (TAD) is a challenging task which aims to temporally localize and recognize the human action in untrimmed videos.
Current mainstream one-stage TAD approaches localize and classify action proposals relying on pre-defined anchors.
A novel TAD model termed as Selective Receptive Field Network (SRF-Net) is developed.
arXiv Detail & Related papers (2021-06-29T11:29:16Z) - BSN++: Complementary Boundary Regressor with Scale-Balanced Relation
Modeling for Temporal Action Proposal Generation [85.13713217986738]
We present BSN++, a new framework which exploits complementary boundary regressor and relation modeling for temporal proposal generation.
Not surprisingly, the proposed BSN++ ranked 1st place in the CVPR19 - ActivityNet challenge leaderboard on temporal action localization task.
arXiv Detail & Related papers (2020-09-15T07:08:59Z) - Simple and Effective Prevention of Mode Collapse in Deep One-Class
Classification [93.2334223970488]
We propose two regularizers to prevent hypersphere collapse in deep SVDD.
The first regularizer is based on injecting random noise via the standard cross-entropy loss.
The second regularizer penalizes the minibatch variance when it becomes too small.
arXiv Detail & Related papers (2020-01-24T03:44:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.