Sub-action Prototype Learning for Point-level Weakly-supervised Temporal
Action Localization
- URL: http://arxiv.org/abs/2309.09060v1
- Date: Sat, 16 Sep 2023 17:57:40 GMT
- Title: Sub-action Prototype Learning for Point-level Weakly-supervised Temporal
Action Localization
- Authors: Yueyang Li, Yonghong Hou, Wanqing Li
- Abstract summary: Point-level weakly-supervised temporal action localization (PWTAL) aims to localize actions with only a single timestamp annotation for each action instance.
Existing methods tend to mine dense pseudo labels to alleviate the label sparsity, but overlook the potential sub-action temporal structures, resulting in inferior performance.
We propose a novel sub-action prototype learning framework (SPL-Loc) which comprises Sub-action Prototype Clustering (SPC) and Ordered Prototype Alignment (OPA)
- Score: 11.777205793663647
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Point-level weakly-supervised temporal action localization (PWTAL) aims to
localize actions with only a single timestamp annotation for each action
instance. Existing methods tend to mine dense pseudo labels to alleviate the
label sparsity, but overlook the potential sub-action temporal structures,
resulting in inferior performance. To tackle this problem, we propose a novel
sub-action prototype learning framework (SPL-Loc) which comprises Sub-action
Prototype Clustering (SPC) and Ordered Prototype Alignment (OPA). SPC
adaptively extracts representative sub-action prototypes which are capable to
perceive the temporal scale and spatial content variation of action instances.
OPA selects relevant prototypes to provide completeness clue for pseudo label
generation by applying a temporal alignment loss. As a result, pseudo labels
are derived from alignment results to improve action boundary prediction.
Extensive experiments on three popular benchmarks demonstrate that the proposed
SPL-Loc significantly outperforms existing SOTA PWTAL methods.
Related papers
- Decoupled Prototype Learning for Reliable Test-Time Adaptation [50.779896759106784]
Test-time adaptation (TTA) is a task that continually adapts a pre-trained source model to the target domain during inference.
One popular approach involves fine-tuning model with cross-entropy loss according to estimated pseudo-labels.
This study reveals that minimizing the classification error of each sample causes the cross-entropy loss's vulnerability to label noise.
We propose a novel Decoupled Prototype Learning (DPL) method that features prototype-centric loss computation.
arXiv Detail & Related papers (2024-01-15T03:33:39Z) - POTLoc: Pseudo-Label Oriented Transformer for Point-Supervised Temporal Action Localization [26.506893363676678]
This paper proposes POTLoc, a Pseudo-label Oriented Transformer for weakly-supervised Action localization.
POTLoc is designed to identify and track continuous action structures via a self-training strategy.
It outperforms the state-of-the-art point-supervised methods on THUMOS'14 and ActivityNet-v1.2 datasets.
arXiv Detail & Related papers (2023-10-20T15:28:06Z) - Proposal-based Temporal Action Localization with Point-level Supervision [29.98225940694062]
Point-level supervised temporal action localization (PTAL) aims at recognizing and localizing actions in untrimmed videos.
We propose a novel method that localizes actions by generating and evaluating action proposals of flexible duration.
Experiments show that our proposed method achieves competitive or superior performance to the state-of-the-art methods.
arXiv Detail & Related papers (2023-10-09T08:27:05Z) - Multi-modal Prompting for Low-Shot Temporal Action Localization [95.19505874963751]
We consider the problem of temporal action localization under low-shot (zero-shot & few-shot) scenario.
We adopt a Transformer-based two-stage action localization architecture with class-agnostic action proposal, followed by open-vocabulary classification.
arXiv Detail & Related papers (2023-03-21T10:40:13Z) - Zero-Shot Temporal Action Detection via Vision-Language Prompting [134.26292288193298]
We propose a novel zero-Shot Temporal Action detection model via Vision-LanguagE prompting (STALE)
Our model significantly outperforms state-of-the-art alternatives.
Our model also yields superior results on supervised TAD over recent strong competitors.
arXiv Detail & Related papers (2022-07-17T13:59:46Z) - Semi-Supervised Temporal Action Detection with Proposal-Free Masking [134.26292288193298]
We propose a novel Semi-supervised Temporal action detection model based on PropOsal-free Temporal mask (SPOT)
SPOT outperforms state-of-the-art alternatives, often by a large margin.
arXiv Detail & Related papers (2022-07-14T16:58:47Z) - Point-Level Temporal Action Localization: Bridging Fully-supervised
Proposals to Weakly-supervised Losses [84.2964408497058]
Point-level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.
Existing methods adopt the frame-level prediction paradigm to learn from the sparse single-frame labels.
This paper attempts to explore the proposal-based prediction paradigm for point-level annotations.
arXiv Detail & Related papers (2020-12-15T12:11:48Z) - Prototypical Contrast and Reverse Prediction: Unsupervised Skeleton
Based Action Recognition [12.463955174384457]
We propose a novel framework named Prototypical Contrast and Reverse Prediction (PCRP)
PCRP creates reverse sequential prediction to learn low-level information and high-level pattern.
It also devises action prototypes to implicitly encode semantic similarity shared among sequences.
arXiv Detail & Related papers (2020-11-14T08:04:23Z) - Unsupervised Domain Adaptation for Spatio-Temporal Action Localization [69.12982544509427]
S-temporal action localization is an important problem in computer vision.
We propose an end-to-end unsupervised domain adaptation algorithm.
We show that significant performance gain can be achieved when spatial and temporal features are adapted separately or jointly.
arXiv Detail & Related papers (2020-10-19T04:25:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.