Point-Level Temporal Action Localization: Bridging Fully-supervised
Proposals to Weakly-supervised Losses
- URL: http://arxiv.org/abs/2012.08236v1
- Date: Tue, 15 Dec 2020 12:11:48 GMT
- Title: Point-Level Temporal Action Localization: Bridging Fully-supervised
Proposals to Weakly-supervised Losses
- Authors: Chen Ju, Peisen Zhao, Ya Zhang, Yanfeng Wang, Qi Tian
- Abstract summary: Point-level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.
Existing methods adopt the frame-level prediction paradigm to learn from the sparse single-frame labels.
This paper attempts to explore the proposal-based prediction paradigm for point-level annotations.
- Score: 84.2964408497058
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Point-Level temporal action localization (PTAL) aims to localize actions in
untrimmed videos with only one timestamp annotation for each action instance.
Existing methods adopt the frame-level prediction paradigm to learn from the
sparse single-frame labels. However, such a framework inevitably suffers from a
large solution space. This paper attempts to explore the proposal-based
prediction paradigm for point-level annotations, which has the advantage of
more constrained solution space and consistent predictions among neighboring
frames. The point-level annotations are first used as the keypoint supervision
to train a keypoint detector. At the location prediction stage, a simple but
effective mapper module, which enables back-propagation of training errors, is
then introduced to bridge the fully-supervised framework with weak supervision.
To our best of knowledge, this is the first work to leverage the
fully-supervised paradigm for the point-level setting. Experiments on THUMOS14,
BEOID, and GTEA verify the effectiveness of our proposed method both
quantitatively and qualitatively, and demonstrate that our method outperforms
state-of-the-art methods.
Related papers
- POTLoc: Pseudo-Label Oriented Transformer for Point-Supervised Temporal Action Localization [26.506893363676678]
This paper proposes POTLoc, a Pseudo-label Oriented Transformer for weakly-supervised Action localization.
POTLoc is designed to identify and track continuous action structures via a self-training strategy.
It outperforms the state-of-the-art point-supervised methods on THUMOS'14 and ActivityNet-v1.2 datasets.
arXiv Detail & Related papers (2023-10-20T15:28:06Z) - Proposal-based Temporal Action Localization with Point-level Supervision [29.98225940694062]
Point-level supervised temporal action localization (PTAL) aims at recognizing and localizing actions in untrimmed videos.
We propose a novel method that localizes actions by generating and evaluating action proposals of flexible duration.
Experiments show that our proposed method achieves competitive or superior performance to the state-of-the-art methods.
arXiv Detail & Related papers (2023-10-09T08:27:05Z) - Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal
Action Localization [98.66318678030491]
Weakly-supervised temporal action localization aims to localize and recognize actions in untrimmed videos with only video-level category labels during training.
We propose a novel Proposal-based Multiple Instance Learning (P-MIL) framework that directly classifies the candidate proposals in both the training and testing stages.
arXiv Detail & Related papers (2023-05-29T02:48:04Z) - Weakly-supervised Action Localization via Hierarchical Mining [76.00021423700497]
Weakly-supervised action localization aims to localize and classify action instances in the given videos temporally with only video-level categorical labels.
We propose a hierarchical mining strategy under video-level and snippet-level manners, i.e., hierarchical supervision and hierarchical consistency mining.
We show that HiM-Net outperforms existing methods on THUMOS14 and ActivityNet1.3 datasets with large margins by hierarchically mining the supervision and consistency.
arXiv Detail & Related papers (2022-06-22T12:19:09Z) - Point-Teaching: Weakly Semi-Supervised Object Detection with Point
Annotations [81.02347863372364]
We present Point-Teaching, a weakly semi-supervised object detection framework.
Specifically, we propose a Hungarian-based point matching method to generate pseudo labels for point annotated images.
We propose a simple-yet-effective data augmentation, termed point-guided copy-paste, to reduce the impact of the unmatched points.
arXiv Detail & Related papers (2022-06-01T07:04:38Z) - Learning Salient Boundary Feature for Anchor-free Temporal Action
Localization [81.55295042558409]
Temporal action localization is an important yet challenging task in video understanding.
We propose the first purely anchor-free temporal localization method.
Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module, and (iii) several consistency constraints.
arXiv Detail & Related papers (2021-03-24T12:28:32Z) - Panoster: End-to-end Panoptic Segmentation of LiDAR Point Clouds [81.12016263972298]
We present Panoster, a novel proposal-free panoptic segmentation method for LiDAR point clouds.
Unlike previous approaches, Panoster proposes a simplified framework incorporating a learning-based clustering solution to identify instances.
At inference time, this acts as a class-agnostic segmentation, allowing Panoster to be fast, while outperforming prior methods in terms of accuracy.
arXiv Detail & Related papers (2020-10-28T18:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.