Transferable Knowledge-Based Multi-Granularity Aggregation Network for
Temporal Action Localization: Submission to ActivityNet Challenge 2021
- URL: http://arxiv.org/abs/2107.12618v1
- Date: Tue, 27 Jul 2021 06:18:21 GMT
- Title: Transferable Knowledge-Based Multi-Granularity Aggregation Network for
Temporal Action Localization: Submission to ActivityNet Challenge 2021
- Authors: Haisheng Su, Peiqin Zhuang, Yukun Li, Dongliang Wang, Weihao Gan, Wei
Wu, Yu Qiao
- Abstract summary: This report presents an overview of our solution used in the submission to 2021 HACS Temporal Action localization Challenge.
We use Temporal Context Aggregation Network (TCANet) to generate high-quality action proposals.
We also adopt an additional module to transfer the knowledge from trimmed videos to untrimmed videos.
Our proposed scheme achieves 39.91 and 29.78 average mAP on the challenge testing set of supervised and weakly-supervised temporal action localization track respectively.
- Score: 33.840281113206444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This technical report presents an overview of our solution used in the
submission to 2021 HACS Temporal Action Localization Challenge on both
Supervised Learning Track and Weakly-Supervised Learning Track. Temporal Action
Localization (TAL) requires to not only precisely locate the temporal
boundaries of action instances, but also accurately classify the untrimmed
videos into specific categories. However, Weakly-Supervised TAL indicates
locating the action instances using only video-level class labels. In this
paper, to train a supervised temporal action localizer, we adopt Temporal
Context Aggregation Network (TCANet) to generate high-quality action proposals
through ``local and global" temporal context aggregation and complementary as
well as progressive boundary refinement. As for the WSTAL, a novel framework is
proposed to handle the poor quality of CAS generated by simple classification
network, which can only focus on local discriminative parts, rather than locate
the entire interval of target actions. Further inspired by the transfer
learning method, we also adopt an additional module to transfer the knowledge
from trimmed videos (HACS Clips dataset) to untrimmed videos (HACS Segments
dataset), aiming at promoting the classification performance on untrimmed
videos. Finally, we employ a boundary regression module embedded with
Outer-Inner-Contrastive (OIC) loss to automatically predict the boundaries
based on the enhanced CAS. Our proposed scheme achieves 39.91 and 29.78 average
mAP on the challenge testing set of supervised and weakly-supervised temporal
action localization track respectively.
Related papers
- Open-Vocabulary Spatio-Temporal Action Detection [59.91046192096296]
Open-vocabulary-temporal action detection (OV-STAD) is an important fine-grained video understanding task.
OV-STAD requires training a model on a limited set of base classes with box and label supervision.
To better adapt the holistic VLM for the fine-grained action detection task, we carefully fine-tune it on the localized video region-text pairs.
arXiv Detail & Related papers (2024-05-17T14:52:47Z) - Cross-Video Contextual Knowledge Exploration and Exploitation for
Ambiguity Reduction in Weakly Supervised Temporal Action Localization [23.94629999419033]
Weakly supervised temporal action localization (WSTAL) aims to localize actions in untrimmed videos using video-level labels.
Our work addresses this from a novel perspective, by exploring and exploiting the cross-video contextual knowledge within the dataset.
Our method outperforms the state-of-the-art methods, and can be easily plugged into other WSTAL methods.
arXiv Detail & Related papers (2023-08-24T07:19:59Z) - SRF-Net: Selective Receptive Field Network for Anchor-Free Temporal
Action Detection [32.159784061961886]
Temporal action detection (TAD) is a challenging task which aims to temporally localize and recognize the human action in untrimmed videos.
Current mainstream one-stage TAD approaches localize and classify action proposals relying on pre-defined anchors.
A novel TAD model termed as Selective Receptive Field Network (SRF-Net) is developed.
arXiv Detail & Related papers (2021-06-29T11:29:16Z) - Weakly-Supervised Temporal Action Localization Through Local-Global
Background Modeling [30.104982661371164]
We present our 2021 HACS Challenge - Weakly-supervised Learning Track solution that based on BaSNet to address above problem.
Specifically, we first adopt pre-trained CSN, Slowfast, TDN, and ViViT as feature extractors to get feature sequences.
Then our proposed Local-Global Background Modeling Network (LGBM-Net) is trained to localize instances by using only video-level labels.
arXiv Detail & Related papers (2021-06-20T02:58:45Z) - Weakly Supervised Temporal Action Localization Through Learning Explicit
Subspaces for Action and Context [151.23835595907596]
Methods learn to localize temporal starts and ends of action instances in a video under only video-level supervision.
We introduce a framework that learns two feature subspaces respectively for actions and their context.
The proposed approach outperforms state-of-the-art WS-TAL methods on three benchmarks.
arXiv Detail & Related papers (2021-03-30T08:26:53Z) - Temporal Context Aggregation Network for Temporal Action Proposal
Refinement [93.03730692520999]
Temporal action proposal generation is a challenging yet important task in the video understanding field.
Current methods still suffer from inaccurate temporal boundaries and inferior confidence used for retrieval.
We propose TCANet to generate high-quality action proposals through "local and global" temporal context aggregation.
arXiv Detail & Related papers (2021-03-24T12:34:49Z) - Boundary-sensitive Pre-training for Temporal Localization in Videos [124.40788524169668]
We investigate model pre-training for temporal localization by introducing a novel boundary-sensitive pretext ( BSP) task.
With the synthesized boundaries, BSP can be simply conducted via classifying the boundary types.
Extensive experiments show that the proposed BSP is superior and complementary to the existing action classification based pre-training counterpart.
arXiv Detail & Related papers (2020-11-21T17:46:24Z) - Two-Stream Consensus Network for Weakly-Supervised Temporal Action
Localization [94.37084866660238]
We present a Two-Stream Consensus Network (TSCN) to simultaneously address these challenges.
The proposed TSCN features an iterative refinement training method, where a frame-level pseudo ground truth is iteratively updated.
We propose a new attention normalization loss to encourage the predicted attention to act like a binary selection, and promote the precise localization of action instance boundaries.
arXiv Detail & Related papers (2020-10-22T10:53:32Z) - Complementary Boundary Generator with Scale-Invariant Relation Modeling
for Temporal Action Localization: Submission to ActivityNet Challenge 2020 [66.4527310659592]
This report presents an overview of our solution used in the submission to ActivityNet Challenge 2020 Task 1.
We decouple the temporal action localization task into two stages (i.e. proposal generation and classification) and enrich the proposal diversity.
Our proposed scheme achieves the state-of-the-art performance on the temporal action localization task with textbf42.26 average mAP on the challenge testing set.
arXiv Detail & Related papers (2020-07-20T04:35:40Z) - Temporal Fusion Network for Temporal Action Localization:Submission to
ActivityNet Challenge 2020 (Task E) [45.3218136336925]
This report analyzes a temporal action localization method we used in the HACS competition which is hosted in Activitynet Challenge 2020.
The goal of our task is to locate the start time and end time of the action in the untrimmed video, and predict action category.
By fusing the results of multiple models, our method obtains 40.55% on the validation set and 40.53% on the test set in terms of mAP, and achieves Rank 1 in this challenge.
arXiv Detail & Related papers (2020-06-13T00:33:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.