Active Learning with Effective Scoring Functions for Semi-Supervised
Temporal Action Localization
- URL: http://arxiv.org/abs/2208.14856v1
- Date: Wed, 31 Aug 2022 13:39:38 GMT
- Title: Active Learning with Effective Scoring Functions for Semi-Supervised
Temporal Action Localization
- Authors: Ding Li, Xuebing Yang, Yongqiang Tang, Chenyang Zhang and Wensheng
Zhang
- Abstract summary: This paper focuses on a rarely investigated yet practical task named semi-supervised TAL.
We propose an effective active learning method, named AL-STAL.
Experiment results show that AL-STAL outperforms the existing competitors and achieves satisfying performance compared with fully-supervised learning.
- Score: 15.031156121516211
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Temporal Action Localization (TAL) aims to predict both action category and
temporal boundary of action instances in untrimmed videos, i.e., start and end
time. Fully-supervised solutions are usually adopted in most existing works,
and proven to be effective. One of the practical bottlenecks in these solutions
is the large amount of labeled training data required. To reduce expensive
human label cost, this paper focuses on a rarely investigated yet practical
task named semi-supervised TAL and proposes an effective active learning
method, named AL-STAL. We leverage four steps for actively selecting video
samples with high informativeness and training the localization model, named
\emph{Train, Query, Annotate, Append}. Two scoring functions that consider the
uncertainty of localization model are equipped in AL-STAL, thus facilitating
the video sample rank and selection. One takes entropy of predicted label
distribution as measure of uncertainty, named Temporal Proposal Entropy (TPE).
And the other introduces a new metric based on mutual information between
adjacent action proposals and evaluates the informativeness of video samples,
named Temporal Context Inconsistency (TCI). To validate the effectiveness of
proposed method, we conduct extensive experiments on two benchmark datasets
THUMOS'14 and ActivityNet 1.3. Experiment results show that AL-STAL outperforms
the existing competitors and achieves satisfying performance compared with
fully-supervised learning.
Related papers
- Open-Vocabulary Spatio-Temporal Action Detection [59.91046192096296]
Open-vocabulary-temporal action detection (OV-STAD) is an important fine-grained video understanding task.
OV-STAD requires training a model on a limited set of base classes with box and label supervision.
To better adapt the holistic VLM for the fine-grained action detection task, we carefully fine-tune it on the localized video region-text pairs.
arXiv Detail & Related papers (2024-05-17T14:52:47Z) - Test-Time Zero-Shot Temporal Action Localization [58.84919541314969]
ZS-TAL seeks to identify and locate actions in untrimmed videos unseen during training.
Training-based ZS-TAL approaches assume the availability of labeled data for supervised learning.
We introduce a novel method that performs Test-Time adaptation for Temporal Action localization (T3AL)
arXiv Detail & Related papers (2024-04-08T11:54:49Z) - Proposal-based Temporal Action Localization with Point-level Supervision [29.98225940694062]
Point-level supervised temporal action localization (PTAL) aims at recognizing and localizing actions in untrimmed videos.
We propose a novel method that localizes actions by generating and evaluating action proposals of flexible duration.
Experiments show that our proposed method achieves competitive or superior performance to the state-of-the-art methods.
arXiv Detail & Related papers (2023-10-09T08:27:05Z) - Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal
Action Localization [98.66318678030491]
Weakly-supervised temporal action localization aims to localize and recognize actions in untrimmed videos with only video-level category labels during training.
We propose a novel Proposal-based Multiple Instance Learning (P-MIL) framework that directly classifies the candidate proposals in both the training and testing stages.
arXiv Detail & Related papers (2023-05-29T02:48:04Z) - Zero-Shot Temporal Action Detection via Vision-Language Prompting [134.26292288193298]
We propose a novel zero-Shot Temporal Action detection model via Vision-LanguagE prompting (STALE)
Our model significantly outperforms state-of-the-art alternatives.
Our model also yields superior results on supervised TAD over recent strong competitors.
arXiv Detail & Related papers (2022-07-17T13:59:46Z) - Temporal Action Detection with Global Segmentation Mask Learning [134.26292288193298]
Existing temporal action detection (TAD) methods rely on generating an overwhelmingly large number of proposals per video.
We propose a proposal-free Temporal Action detection model with Global mask (TAGS)
Our core idea is to learn a global segmentation mask of each action instance jointly at the full video length.
arXiv Detail & Related papers (2022-07-14T00:46:51Z) - End-to-End Semi-Supervised Learning for Video Action Detection [23.042410033982193]
We propose a simple end-to-end based approach effectively which utilizes the unlabeled data.
Video action detection requires both, action class prediction as well as a-temporal consistency.
We demonstrate the effectiveness of the proposed approach on two different action detection benchmark datasets.
arXiv Detail & Related papers (2022-03-08T18:11:25Z) - Learning Salient Boundary Feature for Anchor-free Temporal Action
Localization [81.55295042558409]
Temporal action localization is an important yet challenging task in video understanding.
We propose the first purely anchor-free temporal localization method.
Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module, and (iii) several consistency constraints.
arXiv Detail & Related papers (2021-03-24T12:28:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.