Adaptive Proposal Generation Network for Temporal Sentence Localization
in Videos
- URL: http://arxiv.org/abs/2109.06398v1
- Date: Tue, 14 Sep 2021 02:02:36 GMT
- Title: Adaptive Proposal Generation Network for Temporal Sentence Localization
in Videos
- Authors: Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou
- Abstract summary: We address the problem of temporal sentence localization in videos (TSLV)
Traditional methods follow a top-down framework which localizes the target segment with pre-defined segment proposals.
We propose an Adaptive Proposal Generation Network (APGN) to maintain the segment-level interaction while speeding up the efficiency.
- Score: 58.83440885457272
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the problem of temporal sentence localization in videos (TSLV).
Traditional methods follow a top-down framework which localizes the target
segment with pre-defined segment proposals. Although they have achieved decent
performance, the proposals are handcrafted and redundant. Recently, bottom-up
framework attracts increasing attention due to its superior efficiency. It
directly predicts the probabilities for each frame as a boundary. However, the
performance of bottom-up model is inferior to the top-down counterpart as it
fails to exploit the segment-level interaction. In this paper, we propose an
Adaptive Proposal Generation Network (APGN) to maintain the segment-level
interaction while speeding up the efficiency. Specifically, we first perform a
foreground-background classification upon the video and regress on the
foreground frames to adaptively generate proposals. In this way, the
handcrafted proposal design is discarded and the redundant proposals are
decreased. Then, a proposal consolidation module is further developed to
enhance the semantic of the generated proposals. Finally, we locate the target
moments with these generated proposals following the top-down framework.
Extensive experiments on three challenging benchmarks show that our proposed
APGN significantly outperforms previous state-of-the-art methods.
Related papers
- Dense Hybrid Proposal Modulation for Lane Detection [72.49084826234363]
We present a dense hybrid proposal modulation (DHPM) method for lane detection.
We densely modulate all proposals to generate topologically and spatially high-quality lane predictions.
Our DHPM achieves very competitive performances on four popular datasets.
arXiv Detail & Related papers (2023-04-28T14:31:11Z) - ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via
Exploiting CLIP Cues [49.88590455664064]
ProposalCLIP is able to predict proposals for a large variety of object categories without annotations.
ProposalCLIP also shows benefits for downstream tasks, such as unsupervised object detection.
arXiv Detail & Related papers (2022-01-18T01:51:35Z) - Temporal Action Proposal Generation with Background Constraint [25.783837570359267]
Temporal action proposal generation (TAPG) is a challenging task that aims to locate action instances in untrimmed videos with temporal boundaries.
To evaluate the confidence of proposals, the existing works typically predict action score of proposals that are supervised by the temporal Intersection-over-Union (tIoU) between proposal and the ground-truth.
In this paper, we innovatively propose a general auxiliary Background Constraint idea to further suppress low-quality proposals.
arXiv Detail & Related papers (2021-12-15T09:20:49Z) - Natural Language Video Localization with Learnable Moment Proposals [40.91060659795612]
We propose a novel model termed LPNet (Learnable Proposal Network for NLVL) with a fixed set of learnable moment proposals.
In this paper, we demonstrate the effectiveness of LPNet over existing state-of-the-art methods.
arXiv Detail & Related papers (2021-09-22T12:18:58Z) - Temporal Context Aggregation Network for Temporal Action Proposal
Refinement [93.03730692520999]
Temporal action proposal generation is a challenging yet important task in the video understanding field.
Current methods still suffer from inaccurate temporal boundaries and inferior confidence used for retrieval.
We propose TCANet to generate high-quality action proposals through "local and global" temporal context aggregation.
arXiv Detail & Related papers (2021-03-24T12:34:49Z) - BSN++: Complementary Boundary Regressor with Scale-Balanced Relation
Modeling for Temporal Action Proposal Generation [85.13713217986738]
We present BSN++, a new framework which exploits complementary boundary regressor and relation modeling for temporal proposal generation.
Not surprisingly, the proposed BSN++ ranked 1st place in the CVPR19 - ActivityNet challenge leaderboard on temporal action localization task.
arXiv Detail & Related papers (2020-09-15T07:08:59Z) - Complementary Boundary Generator with Scale-Invariant Relation Modeling
for Temporal Action Localization: Submission to ActivityNet Challenge 2020 [66.4527310659592]
This report presents an overview of our solution used in the submission to ActivityNet Challenge 2020 Task 1.
We decouple the temporal action localization task into two stages (i.e. proposal generation and classification) and enrich the proposal diversity.
Our proposed scheme achieves the state-of-the-art performance on the temporal action localization task with textbf42.26 average mAP on the challenge testing set.
arXiv Detail & Related papers (2020-07-20T04:35:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.