Related papers: Sub-action Prototype Learning for Point-level Weakly-supervised Temporal Action Localization

Sub-action Prototype Learning for Point-level Weakly-supervised Temporal Action Localization

URL: http://arxiv.org/abs/2309.09060v1
Date: Sat, 16 Sep 2023 17:57:40 GMT
Title: Sub-action Prototype Learning for Point-level Weakly-supervised Temporal Action Localization
Authors: Yueyang Li, Yonghong Hou, Wanqing Li
Abstract summary: Point-level weakly-supervised temporal action localization (PWTAL) aims to localize actions with only a single timestamp annotation for each action instance. Existing methods tend to mine dense pseudo labels to alleviate the label sparsity, but overlook the potential sub-action temporal structures, resulting in inferior performance. We propose a novel sub-action prototype learning framework (SPL-Loc) which comprises Sub-action Prototype Clustering (SPC) and Ordered Prototype Alignment (OPA)
Score: 11.777205793663647
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Point-level weakly-supervised temporal action localization (PWTAL) aims to localize actions with only a single timestamp annotation for each action instance. Existing methods tend to mine dense pseudo labels to alleviate the label sparsity, but overlook the potential sub-action temporal structures, resulting in inferior performance. To tackle this problem, we propose a novel sub-action prototype learning framework (SPL-Loc) which comprises Sub-action Prototype Clustering (SPC) and Ordered Prototype Alignment (OPA). SPC adaptively extracts representative sub-action prototypes which are capable to perceive the temporal scale and spatial content variation of action instances. OPA selects relevant prototypes to provide completeness clue for pseudo label generation by applying a temporal alignment loss. As a result, pseudo labels are derived from alignment results to improve action boundary prediction. Extensive experiments on three popular benchmarks demonstrate that the proposed SPL-Loc significantly outperforms existing SOTA PWTAL methods.

Related papers

Demystifying Catastrophic Forgetting in Two-Stage Incremental Object Detector [42.40881712297689]
Catastrophic forgetting is predominantly localized to the RoI Head. NSGP-RePRE mitigates forgetting via replay of two types of prototypes. NSGP-RePRE achieves state-of-the-art performance on the Pascal VOC and MS COCO datasets.
arXiv Detail & Related papers (2025-02-08T12:10:02Z)
Action-Agnostic Point-Level Supervision for Temporal Action Detection [55.86569092972912]
We propose action-agnostic point-level supervision for temporal action detection with a lightly annotated dataset. In the proposed scheme, a small portion of video frames is sampled in an unsupervised manner and presented to human annotators, who then label the frames with action categories. Unlike point-level supervision, which requires annotators to search for every action instance in an untrimmed video, frames to annotate are selected without human intervention.
arXiv Detail & Related papers (2024-12-30T18:59:55Z)
Decoupled Prototype Learning for Reliable Test-Time Adaptation [50.779896759106784]
Test-time adaptation (TTA) is a task that continually adapts a pre-trained source model to the target domain during inference. One popular approach involves fine-tuning model with cross-entropy loss according to estimated pseudo-labels. This study reveals that minimizing the classification error of each sample causes the cross-entropy loss's vulnerability to label noise. We propose a novel Decoupled Prototype Learning (DPL) method that features prototype-centric loss computation.
arXiv Detail & Related papers (2024-01-15T03:33:39Z)
POTLoc: Pseudo-Label Oriented Transformer for Point-Supervised Temporal Action Localization [26.506893363676678]
This paper proposes POTLoc, a Pseudo-label Oriented Transformer for weakly-supervised Action localization. POTLoc is designed to identify and track continuous action structures via a self-training strategy. It outperforms the state-of-the-art point-supervised methods on THUMOS'14 and ActivityNet-v1.2 datasets.
arXiv Detail & Related papers (2023-10-20T15:28:06Z)
Proposal-based Temporal Action Localization with Point-level Supervision [29.98225940694062]
Point-level supervised temporal action localization (PTAL) aims at recognizing and localizing actions in untrimmed videos. We propose a novel method that localizes actions by generating and evaluating action proposals of flexible duration. Experiments show that our proposed method achieves competitive or superior performance to the state-of-the-art methods.
arXiv Detail & Related papers (2023-10-09T08:27:05Z)
Multi-modal Prompting for Low-Shot Temporal Action Localization [95.19505874963751]
We consider the problem of temporal action localization under low-shot (zero-shot & few-shot) scenario. We adopt a Transformer-based two-stage action localization architecture with class-agnostic action proposal, followed by open-vocabulary classification.
arXiv Detail & Related papers (2023-03-21T10:40:13Z)
Zero-Shot Temporal Action Detection via Vision-Language Prompting [134.26292288193298]
We propose a novel zero-Shot Temporal Action detection model via Vision-LanguagE prompting (STALE) Our model significantly outperforms state-of-the-art alternatives. Our model also yields superior results on supervised TAD over recent strong competitors.
arXiv Detail & Related papers (2022-07-17T13:59:46Z)
Semi-Supervised Temporal Action Detection with Proposal-Free Masking [134.26292288193298]
We propose a novel Semi-supervised Temporal action detection model based on PropOsal-free Temporal mask (SPOT) SPOT outperforms state-of-the-art alternatives, often by a large margin.
arXiv Detail & Related papers (2022-07-14T16:58:47Z)
Point-Level Temporal Action Localization: Bridging Fully-supervised Proposals to Weakly-supervised Losses [84.2964408497058]
Point-level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance. Existing methods adopt the frame-level prediction paradigm to learn from the sparse single-frame labels. This paper attempts to explore the proposal-based prediction paradigm for point-level annotations.
arXiv Detail & Related papers (2020-12-15T12:11:48Z)
Prototypical Contrast and Reverse Prediction: Unsupervised Skeleton Based Action Recognition [12.463955174384457]
We propose a novel framework named Prototypical Contrast and Reverse Prediction (PCRP) PCRP creates reverse sequential prediction to learn low-level information and high-level pattern. It also devises action prototypes to implicitly encode semantic similarity shared among sequences.
arXiv Detail & Related papers (2020-11-14T08:04:23Z)
Unsupervised Domain Adaptation for Spatio-Temporal Action Localization [69.12982544509427]
S-temporal action localization is an important problem in computer vision. We propose an end-to-end unsupervised domain adaptation algorithm. We show that significant performance gain can be achieved when spatial and temporal features are adapted separately or jointly.
arXiv Detail & Related papers (2020-10-19T04:25:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.