Complementary Boundary Generator with Scale-Invariant Relation Modeling
for Temporal Action Localization: Submission to ActivityNet Challenge 2020
- URL: http://arxiv.org/abs/2007.09883v2
- Date: Wed, 26 Aug 2020 01:51:02 GMT
- Title: Complementary Boundary Generator with Scale-Invariant Relation Modeling
for Temporal Action Localization: Submission to ActivityNet Challenge 2020
- Authors: Haisheng Su, Jinyuan Feng, Hao Shao, Zhenyu Jiang, Manyuan Zhang, Wei
Wu, Yu Liu, Hongsheng Li, Junjie Yan
- Abstract summary: This report presents an overview of our solution used in the submission to ActivityNet Challenge 2020 Task 1.
We decouple the temporal action localization task into two stages (i.e. proposal generation and classification) and enrich the proposal diversity.
Our proposed scheme achieves the state-of-the-art performance on the temporal action localization task with textbf42.26 average mAP on the challenge testing set.
- Score: 66.4527310659592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This technical report presents an overview of our solution used in the
submission to ActivityNet Challenge 2020 Task 1 (\textbf{temporal action
localization/detection}). Temporal action localization requires to not only
precisely locate the temporal boundaries of action instances, but also
accurately classify the untrimmed videos into specific categories. In this
paper, we decouple the temporal action localization task into two stages (i.e.
proposal generation and classification) and enrich the proposal diversity
through exhaustively exploring the influences of multiple components from
different but complementary perspectives. Specifically, in order to generate
high-quality proposals, we consider several factors including the video feature
encoder, the proposal generator, the proposal-proposal relations, the scale
imbalance, and ensemble strategy. Finally, in order to obtain accurate
detections, we need to further train an optimal video classifier to recognize
the generated proposals. Our proposed scheme achieves the state-of-the-art
performance on the temporal action localization task with \textbf{42.26}
average mAP on the challenge testing set.
Related papers
- Multi-modal Prompting for Low-Shot Temporal Action Localization [95.19505874963751]
We consider the problem of temporal action localization under low-shot (zero-shot & few-shot) scenario.
We adopt a Transformer-based two-stage action localization architecture with class-agnostic action proposal, followed by open-vocabulary classification.
arXiv Detail & Related papers (2023-03-21T10:40:13Z) - Context-aware Proposal Network for Temporal Action Detection [47.72048484299649]
This report presents our first place winning solution for temporal action detection task in CVPR-2022 AcitivityNet Challenge.
The task aims to localize temporal boundaries of action instances with specific classes in long untrimmed videos.
We argue that the generated proposals contain rich contextual information, which may benefits detection confidence prediction.
arXiv Detail & Related papers (2022-06-18T01:43:43Z) - Adaptive Proposal Generation Network for Temporal Sentence Localization
in Videos [58.83440885457272]
We address the problem of temporal sentence localization in videos (TSLV)
Traditional methods follow a top-down framework which localizes the target segment with pre-defined segment proposals.
We propose an Adaptive Proposal Generation Network (APGN) to maintain the segment-level interaction while speeding up the efficiency.
arXiv Detail & Related papers (2021-09-14T02:02:36Z) - Transferable Knowledge-Based Multi-Granularity Aggregation Network for
Temporal Action Localization: Submission to ActivityNet Challenge 2021 [33.840281113206444]
This report presents an overview of our solution used in the submission to 2021 HACS Temporal Action localization Challenge.
We use Temporal Context Aggregation Network (TCANet) to generate high-quality action proposals.
We also adopt an additional module to transfer the knowledge from trimmed videos to untrimmed videos.
Our proposed scheme achieves 39.91 and 29.78 average mAP on the challenge testing set of supervised and weakly-supervised temporal action localization track respectively.
arXiv Detail & Related papers (2021-07-27T06:18:21Z) - Temporal Context Aggregation Network for Temporal Action Proposal
Refinement [93.03730692520999]
Temporal action proposal generation is a challenging yet important task in the video understanding field.
Current methods still suffer from inaccurate temporal boundaries and inferior confidence used for retrieval.
We propose TCANet to generate high-quality action proposals through "local and global" temporal context aggregation.
arXiv Detail & Related papers (2021-03-24T12:34:49Z) - BSN++: Complementary Boundary Regressor with Scale-Balanced Relation
Modeling for Temporal Action Proposal Generation [85.13713217986738]
We present BSN++, a new framework which exploits complementary boundary regressor and relation modeling for temporal proposal generation.
Not surprisingly, the proposed BSN++ ranked 1st place in the CVPR19 - ActivityNet challenge leaderboard on temporal action localization task.
arXiv Detail & Related papers (2020-09-15T07:08:59Z) - Temporal Fusion Network for Temporal Action Localization:Submission to
ActivityNet Challenge 2020 (Task E) [45.3218136336925]
This report analyzes a temporal action localization method we used in the HACS competition which is hosted in Activitynet Challenge 2020.
The goal of our task is to locate the start time and end time of the action in the untrimmed video, and predict action category.
By fusing the results of multiple models, our method obtains 40.55% on the validation set and 40.53% on the test set in terms of mAP, and achieves Rank 1 in this challenge.
arXiv Detail & Related papers (2020-06-13T00:33:00Z) - Accurate Temporal Action Proposal Generation with Relation-Aware Pyramid
Network [29.7640925776191]
We propose a Relation-aware pyramid Network (RapNet) to generate highly accurate temporal action proposals.
In RapNet, a novel relation-aware module is introduced to exploit bi-directional long-range relations between local features for context distilling.
arXiv Detail & Related papers (2020-03-09T13:47:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.