Learning Action Completeness from Points for Weakly-supervised Temporal
Action Localization
- URL: http://arxiv.org/abs/2108.05029v1
- Date: Wed, 11 Aug 2021 04:54:39 GMT
- Title: Learning Action Completeness from Points for Weakly-supervised Temporal
Action Localization
- Authors: Pilhyeon Lee, Hyeran Byun
- Abstract summary: We tackle the problem of localizing temporal intervals of actions with only a single frame label for each action instance for training.
In this paper, we propose a novel framework, where dense pseudo-labels are generated to provide completeness guidance for the model.
- Score: 15.603643098270409
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We tackle the problem of localizing temporal intervals of actions with only a
single frame label for each action instance for training. Owing to label
sparsity, existing work fails to learn action completeness, resulting in
fragmentary action predictions. In this paper, we propose a novel framework,
where dense pseudo-labels are generated to provide completeness guidance for
the model. Concretely, we first select pseudo background points to supplement
point-level action labels. Then, by taking the points as seeds, we search for
the optimal sequence that is likely to contain complete action instances while
agreeing with the seeds. To learn completeness from the obtained sequence, we
introduce two novel losses that contrast action instances with background ones
in terms of action score and feature similarity, respectively. Experimental
results demonstrate that our completeness guidance indeed helps the model to
locate complete action instances, leading to large performance gains especially
under high IoU thresholds. Moreover, we demonstrate the superiority of our
method over existing state-of-the-art methods on four benchmarks: THUMOS'14,
GTEA, BEOID, and ActivityNet. Notably, our method even performs comparably to
recent fully-supervised methods, at the 6 times cheaper annotation cost. Our
code is available at https://github.com/Pilhyeon.
Related papers
- FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition [57.17966905865054]
Real-life applications of action recognition often require a fine-grained understanding of subtle movements.
Existing semi-supervised action recognition has mainly focused on coarse-grained action recognition.
We propose an Alignability-Verification-based Metric learning technique to effectively discriminate between fine-grained action pairs.
arXiv Detail & Related papers (2024-09-02T20:08:06Z) - Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations.
We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - One-bit Supervision for Image Classification: Problem, Solution, and
Beyond [114.95815360508395]
This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification.
We propose a multi-stage training paradigm and incorporate negative label suppression into an off-the-shelf semi-supervised learning algorithm.
In multiple benchmarks, the learning efficiency of the proposed approach surpasses that using full-bit, semi-supervised supervision.
arXiv Detail & Related papers (2023-11-26T07:39:00Z) - Proposal-based Temporal Action Localization with Point-level Supervision [29.98225940694062]
Point-level supervised temporal action localization (PTAL) aims at recognizing and localizing actions in untrimmed videos.
We propose a novel method that localizes actions by generating and evaluating action proposals of flexible duration.
Experiments show that our proposed method achieves competitive or superior performance to the state-of-the-art methods.
arXiv Detail & Related papers (2023-10-09T08:27:05Z) - Action Sensitivity Learning for Temporal Action Localization [35.65086250175736]
We propose an Action Sensitivity Learning framework (ASL) to tackle the task of temporal action localization.
We first introduce a lightweight Action Sensitivity Evaluator to learn the action sensitivity at the class level and instance level, respectively.
Based on the action sensitivity of each frame, we design an Action Sensitive Contrastive Loss to enhance features, where the action-aware frames are sampled as positive pairs to push away the action-irrelevant frames.
arXiv Detail & Related papers (2023-05-25T04:19:14Z) - Improving Weakly Supervised Temporal Action Localization by Bridging
Train-Test Gap in Pseudo Labels [38.35756338815097]
Pseudo-label-based methods, which serve as an effective solution, have been widely studied recently.
Existing methods generate pseudo labels during training and make predictions during testing under different pipelines or settings.
We propose to generate high-quality pseudo labels from the predicted action boundaries.
arXiv Detail & Related papers (2023-04-17T03:47:41Z) - Active Learning with Effective Scoring Functions for Semi-Supervised
Temporal Action Localization [15.031156121516211]
This paper focuses on a rarely investigated yet practical task named semi-supervised TAL.
We propose an effective active learning method, named AL-STAL.
Experiment results show that AL-STAL outperforms the existing competitors and achieves satisfying performance compared with fully-supervised learning.
arXiv Detail & Related papers (2022-08-31T13:39:38Z) - Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with
Hierarchical Atomic Actions [13.665489987620724]
We tackle the problem of weakly-supervised fine-grained temporal action detection in videos for the first time.
We propose to model actions as the combinations of reusable atomic actions which are automatically discovered from data.
Our approach constructs a visual representation hierarchy of four levels: clip level, atomic action level, fine action class level and coarse action class level, with supervision at each level.
arXiv Detail & Related papers (2022-07-24T20:32:24Z) - Learning Salient Boundary Feature for Anchor-free Temporal Action
Localization [81.55295042558409]
Temporal action localization is an important yet challenging task in video understanding.
We propose the first purely anchor-free temporal localization method.
Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module, and (iii) several consistency constraints.
arXiv Detail & Related papers (2021-03-24T12:28:32Z) - Point-Level Temporal Action Localization: Bridging Fully-supervised
Proposals to Weakly-supervised Losses [84.2964408497058]
Point-level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.
Existing methods adopt the frame-level prediction paradigm to learn from the sparse single-frame labels.
This paper attempts to explore the proposal-based prediction paradigm for point-level annotations.
arXiv Detail & Related papers (2020-12-15T12:11:48Z) - FineGym: A Hierarchical Video Dataset for Fine-grained Action
Understanding [118.32912239230272]
FineGym is a new action recognition dataset built on top of gymnastic videos.
It provides temporal annotations at both action and sub-action levels with a three-level semantic hierarchy.
This new level of granularity presents significant challenges for action recognition.
arXiv Detail & Related papers (2020-04-14T17:55:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.