Related papers: Temporal Action Proposal Generation with Background Constraint

Temporal Action Proposal Generation with Background Constraint

URL: http://arxiv.org/abs/2112.07984v1
Date: Wed, 15 Dec 2021 09:20:49 GMT
Title: Temporal Action Proposal Generation with Background Constraint
Authors: Haosen Yang, Wenhao Wu, Lining Wang, Sheng Jin, Boyang Xia, Hongxun Yao, Hujie Huang
Abstract summary: Temporal action proposal generation (TAPG) is a challenging task that aims to locate action instances in untrimmed videos with temporal boundaries. To evaluate the confidence of proposals, the existing works typically predict action score of proposals that are supervised by the temporal Intersection-over-Union (tIoU) between proposal and the ground-truth. In this paper, we innovatively propose a general auxiliary Background Constraint idea to further suppress low-quality proposals.
Score: 25.783837570359267
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Temporal action proposal generation (TAPG) is a challenging task that aims to locate action instances in untrimmed videos with temporal boundaries. To evaluate the confidence of proposals, the existing works typically predict action score of proposals that are supervised by the temporal Intersection-over-Union (tIoU) between proposal and the ground-truth. In this paper, we innovatively propose a general auxiliary Background Constraint idea to further suppress low-quality proposals, by utilizing the background prediction score to restrict the confidence of proposals. In this way, the Background Constraint concept can be easily plug-and-played into existing TAPG methods (e.g., BMN, GTAD). From this perspective, we propose the Background Constraint Network (BCNet) to further take advantage of the rich information of action and background. Specifically, we introduce an Action-Background Interaction module for reliable confidence evaluation, which models the inconsistency between action and background by attention mechanisms at the frame and clip levels. Extensive experiments are conducted on two popular benchmarks, i.e., ActivityNet-1.3 and THUMOS14. The results demonstrate that our method outperforms state-of-the-art methods. Equipped with the existing action classifier, our method also achieves remarkable performance on the temporal action localization task.

Related papers

Adaptive Proposal Generation Network for Temporal Sentence Localization in Videos [58.83440885457272]
We address the problem of temporal sentence localization in videos (TSLV) Traditional methods follow a top-down framework which localizes the target segment with pre-defined segment proposals. We propose an Adaptive Proposal Generation Network (APGN) to maintain the segment-level interaction while speeding up the efficiency.
arXiv Detail & Related papers (2021-09-14T02:02:36Z)
Temporal Context Aggregation Network for Temporal Action Proposal Refinement [93.03730692520999]
Temporal action proposal generation is a challenging yet important task in the video understanding field. Current methods still suffer from inaccurate temporal boundaries and inferior confidence used for retrieval. We propose TCANet to generate high-quality action proposals through "local and global" temporal context aggregation.
arXiv Detail & Related papers (2021-03-24T12:34:49Z)
Point-Level Temporal Action Localization: Bridging Fully-supervised Proposals to Weakly-supervised Losses [84.2964408497058]
Point-level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance. Existing methods adopt the frame-level prediction paradigm to learn from the sparse single-frame labels. This paper attempts to explore the proposal-based prediction paradigm for point-level annotations.
arXiv Detail & Related papers (2020-12-15T12:11:48Z)
Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization [94.37084866660238]
We present a Two-Stream Consensus Network (TSCN) to simultaneously address these challenges. The proposed TSCN features an iterative refinement training method, where a frame-level pseudo ground truth is iteratively updated. We propose a new attention normalization loss to encourage the predicted attention to act like a binary selection, and promote the precise localization of action instance boundaries.
arXiv Detail & Related papers (2020-10-22T10:53:32Z)
BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation [85.13713217986738]
We present BSN++, a new framework which exploits complementary boundary regressor and relation modeling for temporal proposal generation. Not surprisingly, the proposed BSN++ ranked 1st place in the CVPR19 - ActivityNet challenge leaderboard on temporal action localization task.
arXiv Detail & Related papers (2020-09-15T07:08:59Z)
Complementary Boundary Generator with Scale-Invariant Relation Modeling for Temporal Action Localization: Submission to ActivityNet Challenge 2020 [66.4527310659592]
This report presents an overview of our solution used in the submission to ActivityNet Challenge 2020 Task 1. We decouple the temporal action localization task into two stages (i.e. proposal generation and classification) and enrich the proposal diversity. Our proposed scheme achieves the state-of-the-art performance on the temporal action localization task with textbf42.26 average mAP on the challenge testing set.
arXiv Detail & Related papers (2020-07-20T04:35:40Z)
Accurate Temporal Action Proposal Generation with Relation-Aware Pyramid Network [29.7640925776191]
We propose a Relation-aware pyramid Network (RapNet) to generate highly accurate temporal action proposals. In RapNet, a novel relation-aware module is introduced to exploit bi-directional long-range relations between local features for context distilling.
arXiv Detail & Related papers (2020-03-09T13:47:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.