Boundary-Aware Proposal Generation Method for Temporal Action
Localization
- URL: http://arxiv.org/abs/2309.13810v1
- Date: Mon, 25 Sep 2023 01:41:09 GMT
- Title: Boundary-Aware Proposal Generation Method for Temporal Action
Localization
- Authors: Hao Zhang, Chunyan Feng, Jiahui Yang, Zheng Li, Caili Guo
- Abstract summary: TAL aims to find the categories and temporal boundaries of actions in an untrimmed video.
Most TAL methods rely heavily on action recognition models that are sensitive to action labels rather than temporal boundaries.
We propose a Boundary-Aware Proposal Generation (BAPG) method with contrastive learning.
- Score: 23.79359799496947
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of Temporal Action Localization (TAL) is to find the categories and
temporal boundaries of actions in an untrimmed video. Most TAL methods rely
heavily on action recognition models that are sensitive to action labels rather
than temporal boundaries. More importantly, few works consider the background
frames that are similar to action frames in pixels but dissimilar in semantics,
which also leads to inaccurate temporal boundaries. To address the challenge
above, we propose a Boundary-Aware Proposal Generation (BAPG) method with
contrastive learning. Specifically, we define the above background frames as
hard negative samples. Contrastive learning with hard negative mining is
introduced to improve the discrimination of BAPG. BAPG is independent of the
existing TAL network architecture, so it can be applied plug-and-play to
mainstream TAL models. Extensive experimental results on THUMOS14 and
ActivityNet-1.3 demonstrate that BAPG can significantly improve the performance
of TAL.
Related papers
- Temporal Action Localization with Enhanced Instant Discriminability [66.76095239972094]
Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.
We propose a one-stage framework named TriDet to resolve imprecise predictions of action boundaries by existing methods.
Experimental results demonstrate the robustness of TriDet and its state-of-the-art performance on multiple TAD datasets.
arXiv Detail & Related papers (2023-09-11T16:17:50Z) - Weakly-Supervised Temporal Action Localization with Bidirectional
Semantic Consistency Constraint [83.36913240873236]
Weakly Supervised Temporal Action localization (WTAL) aims to classify and localize temporal boundaries of actions for the video.
We propose a simple yet efficient method, named bidirectional semantic consistency constraint (Bi- SCC) to discriminate the positive actions from co-scene actions.
Experimental results show that our approach outperforms the state-of-the-art methods on THUMOS14 and ActivityNet.
arXiv Detail & Related papers (2023-04-25T07:20:33Z) - Video Activity Localisation with Uncertainties in Temporal Boundary [74.7263952414899]
Methods for video activity localisation over time assume implicitly that activity temporal boundaries are determined and precise.
In unscripted natural videos, different activities transit smoothly, so that it is intrinsically ambiguous to determine in labelling precisely when an activity starts and ends over time.
We introduce Elastic Moment Bounding (EMB) to accommodate flexible and adaptive activity temporal boundaries.
arXiv Detail & Related papers (2022-06-26T16:45:56Z) - Background-Click Supervision for Temporal Action Localization [82.4203995101082]
Weakly supervised temporal action localization aims at learning the instance-level action pattern from the video-level labels, where a significant challenge is action-context confusion.
One recent work builds an action-click supervision framework.
It requires similar annotation costs but can steadily improve the localization performance when compared to the conventional weakly supervised methods.
In this paper, by revealing that the performance bottleneck of the existing approaches mainly comes from the background errors, we find that a stronger action localizer can be trained with labels on the background video frames rather than those on the action frames.
arXiv Detail & Related papers (2021-11-24T12:02:52Z) - Towards High-Quality Temporal Action Detection with Sparse Proposals [14.923321325749196]
Temporal Action Detection aims to localize the temporal segments containing human action instances and predict the action categories.
We introduce Sparse Proposals to interact with the hierarchical features.
Experiments demonstrate the effectiveness of our method, especially under high tIoU thresholds.
arXiv Detail & Related papers (2021-09-18T06:15:19Z) - Learning Salient Boundary Feature for Anchor-free Temporal Action
Localization [81.55295042558409]
Temporal action localization is an important yet challenging task in video understanding.
We propose the first purely anchor-free temporal localization method.
Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module, and (iii) several consistency constraints.
arXiv Detail & Related papers (2021-03-24T12:28:32Z) - Boundary-sensitive Pre-training for Temporal Localization in Videos [124.40788524169668]
We investigate model pre-training for temporal localization by introducing a novel boundary-sensitive pretext ( BSP) task.
With the synthesized boundaries, BSP can be simply conducted via classifying the boundary types.
Extensive experiments show that the proposed BSP is superior and complementary to the existing action classification based pre-training counterpart.
arXiv Detail & Related papers (2020-11-21T17:46:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.