Temporal Action Proposal Generation with Background Constraint
- URL: http://arxiv.org/abs/2112.07984v1
- Date: Wed, 15 Dec 2021 09:20:49 GMT
- Title: Temporal Action Proposal Generation with Background Constraint
- Authors: Haosen Yang, Wenhao Wu, Lining Wang, Sheng Jin, Boyang Xia, Hongxun
Yao, Hujie Huang
- Abstract summary: Temporal action proposal generation (TAPG) is a challenging task that aims to locate action instances in untrimmed videos with temporal boundaries.
To evaluate the confidence of proposals, the existing works typically predict action score of proposals that are supervised by the temporal Intersection-over-Union (tIoU) between proposal and the ground-truth.
In this paper, we innovatively propose a general auxiliary Background Constraint idea to further suppress low-quality proposals.
- Score: 25.783837570359267
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Temporal action proposal generation (TAPG) is a challenging task that aims to
locate action instances in untrimmed videos with temporal boundaries. To
evaluate the confidence of proposals, the existing works typically predict
action score of proposals that are supervised by the temporal
Intersection-over-Union (tIoU) between proposal and the ground-truth. In this
paper, we innovatively propose a general auxiliary Background Constraint idea
to further suppress low-quality proposals, by utilizing the background
prediction score to restrict the confidence of proposals. In this way, the
Background Constraint concept can be easily plug-and-played into existing TAPG
methods (e.g., BMN, GTAD). From this perspective, we propose the Background
Constraint Network (BCNet) to further take advantage of the rich information of
action and background. Specifically, we introduce an Action-Background
Interaction module for reliable confidence evaluation, which models the
inconsistency between action and background by attention mechanisms at the
frame and clip levels. Extensive experiments are conducted on two popular
benchmarks, i.e., ActivityNet-1.3 and THUMOS14. The results demonstrate that
our method outperforms state-of-the-art methods. Equipped with the existing
action classifier, our method also achieves remarkable performance on the
temporal action localization task.
Related papers
- Adaptive Proposal Generation Network for Temporal Sentence Localization
in Videos [58.83440885457272]
We address the problem of temporal sentence localization in videos (TSLV)
Traditional methods follow a top-down framework which localizes the target segment with pre-defined segment proposals.
We propose an Adaptive Proposal Generation Network (APGN) to maintain the segment-level interaction while speeding up the efficiency.
arXiv Detail & Related papers (2021-09-14T02:02:36Z) - Temporal Context Aggregation Network for Temporal Action Proposal
Refinement [93.03730692520999]
Temporal action proposal generation is a challenging yet important task in the video understanding field.
Current methods still suffer from inaccurate temporal boundaries and inferior confidence used for retrieval.
We propose TCANet to generate high-quality action proposals through "local and global" temporal context aggregation.
arXiv Detail & Related papers (2021-03-24T12:34:49Z) - Point-Level Temporal Action Localization: Bridging Fully-supervised
Proposals to Weakly-supervised Losses [84.2964408497058]
Point-level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.
Existing methods adopt the frame-level prediction paradigm to learn from the sparse single-frame labels.
This paper attempts to explore the proposal-based prediction paradigm for point-level annotations.
arXiv Detail & Related papers (2020-12-15T12:11:48Z) - Two-Stream Consensus Network for Weakly-Supervised Temporal Action
Localization [94.37084866660238]
We present a Two-Stream Consensus Network (TSCN) to simultaneously address these challenges.
The proposed TSCN features an iterative refinement training method, where a frame-level pseudo ground truth is iteratively updated.
We propose a new attention normalization loss to encourage the predicted attention to act like a binary selection, and promote the precise localization of action instance boundaries.
arXiv Detail & Related papers (2020-10-22T10:53:32Z) - BSN++: Complementary Boundary Regressor with Scale-Balanced Relation
Modeling for Temporal Action Proposal Generation [85.13713217986738]
We present BSN++, a new framework which exploits complementary boundary regressor and relation modeling for temporal proposal generation.
Not surprisingly, the proposed BSN++ ranked 1st place in the CVPR19 - ActivityNet challenge leaderboard on temporal action localization task.
arXiv Detail & Related papers (2020-09-15T07:08:59Z) - Complementary Boundary Generator with Scale-Invariant Relation Modeling
for Temporal Action Localization: Submission to ActivityNet Challenge 2020 [66.4527310659592]
This report presents an overview of our solution used in the submission to ActivityNet Challenge 2020 Task 1.
We decouple the temporal action localization task into two stages (i.e. proposal generation and classification) and enrich the proposal diversity.
Our proposed scheme achieves the state-of-the-art performance on the temporal action localization task with textbf42.26 average mAP on the challenge testing set.
arXiv Detail & Related papers (2020-07-20T04:35:40Z) - Accurate Temporal Action Proposal Generation with Relation-Aware Pyramid
Network [29.7640925776191]
We propose a Relation-aware pyramid Network (RapNet) to generate highly accurate temporal action proposals.
In RapNet, a novel relation-aware module is introduced to exploit bi-directional long-range relations between local features for context distilling.
arXiv Detail & Related papers (2020-03-09T13:47:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.