Zero-Shot Temporal Action Detection via Vision-Language Prompting
- URL: http://arxiv.org/abs/2207.08184v1
- Date: Sun, 17 Jul 2022 13:59:46 GMT
- Title: Zero-Shot Temporal Action Detection via Vision-Language Prompting
- Authors: Sauradip Nag, Xiatian Zhu, Yi-Zhe Song and Tao Xiang
- Abstract summary: We propose a novel zero-Shot Temporal Action detection model via Vision-LanguagE prompting (STALE)
Our model significantly outperforms state-of-the-art alternatives.
Our model also yields superior results on supervised TAD over recent strong competitors.
- Score: 134.26292288193298
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing temporal action detection (TAD) methods rely on large training data
including segment-level annotations, limited to recognizing previously seen
classes alone during inference. Collecting and annotating a large training set
for each class of interest is costly and hence unscalable. Zero-shot TAD
(ZS-TAD) resolves this obstacle by enabling a pre-trained model to recognize
any unseen action classes. Meanwhile, ZS-TAD is also much more challenging with
significantly less investigation. Inspired by the success of zero-shot image
classification aided by vision-language (ViL) models such as CLIP, we aim to
tackle the more complex TAD task. An intuitive method is to integrate an
off-the-shelf proposal detector with CLIP style classification. However, due to
the sequential localization (e.g, proposal generation) and classification
design, it is prone to localization error propagation. To overcome this
problem, in this paper we propose a novel zero-Shot Temporal Action detection
model via Vision-LanguagE prompting (STALE). Such a novel design effectively
eliminates the dependence between localization and classification by breaking
the route for error propagation in-between. We further introduce an interaction
mechanism between classification and localization for improved optimization.
Extensive experiments on standard ZS-TAD video benchmarks show that our STALE
significantly outperforms state-of-the-art alternatives. Besides, our model
also yields superior results on supervised TAD over recent strong competitors.
The PyTorch implementation of STALE is available at
https://github.com/sauradip/STALE.
Related papers
- ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot
End-to-End Temporal Action Detection [10.012716326383567]
Temporal action detection (TAD) involves the localization and classification of action instances within untrimmed videos.
We present ZEETAD, featuring two modules: dual-localization and zero-shot proposal classification.
We enhance discriminative capability on unseen classes by minimally updating the frozen CLIP encoder with lightweight adapters.
arXiv Detail & Related papers (2023-11-01T00:17:37Z) - Post-Processing Temporal Action Detection [134.26292288193298]
Temporal Action Detection (TAD) methods typically take a pre-processing step in converting an input varying-length video into a fixed-length snippet representation sequence.
This pre-processing step would temporally downsample the video, reducing the inference resolution and hampering the detection performance in the original temporal resolution.
We introduce a novel model-agnostic post-processing method without model redesign and retraining.
arXiv Detail & Related papers (2022-11-27T19:50:37Z) - Few-shot Object Detection with Refined Contrastive Learning [4.520231308678286]
We propose a novel few-shot object detection (FSOD) method with Refined Contrastive Learning (FSRC)
A pre-determination component is introduced to find out the Resemblance Group from novel classes which contains confusable classes.
RCL is pointedly performed on this group of classes in order to increase the inter-class distances among them.
arXiv Detail & Related papers (2022-11-24T09:34:20Z) - Fast Hierarchical Learning for Few-Shot Object Detection [57.024072600597464]
Transfer learning approaches have recently achieved promising results on the few-shot detection task.
These approaches suffer from catastrophic forgetting'' issue due to finetuning of base detector.
We tackle the aforementioned issues in this work.
arXiv Detail & Related papers (2022-10-10T20:31:19Z) - Active Learning with Effective Scoring Functions for Semi-Supervised
Temporal Action Localization [15.031156121516211]
This paper focuses on a rarely investigated yet practical task named semi-supervised TAL.
We propose an effective active learning method, named AL-STAL.
Experiment results show that AL-STAL outperforms the existing competitors and achieves satisfying performance compared with fully-supervised learning.
arXiv Detail & Related papers (2022-08-31T13:39:38Z) - Semi-Supervised Temporal Action Detection with Proposal-Free Masking [134.26292288193298]
We propose a novel Semi-supervised Temporal action detection model based on PropOsal-free Temporal mask (SPOT)
SPOT outperforms state-of-the-art alternatives, often by a large margin.
arXiv Detail & Related papers (2022-07-14T16:58:47Z) - Fine-grained Temporal Contrastive Learning for Weakly-supervised
Temporal Action Localization [87.47977407022492]
This paper argues that learning by contextually comparing sequence-to-sequence distinctions offers an essential inductive bias in weakly-supervised action localization.
Under a differentiable dynamic programming formulation, two complementary contrastive objectives are designed, including Fine-grained Sequence Distance (FSD) contrasting and Longest Common Subsequence (LCS) contrasting.
Our method achieves state-of-the-art performance on two popular benchmarks.
arXiv Detail & Related papers (2022-03-31T05:13:50Z) - Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components.
First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective.
Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.