Faster Learning of Temporal Action Proposal via Sparse Multilevel
Boundary Generator
- URL: http://arxiv.org/abs/2303.03166v1
- Date: Mon, 6 Mar 2023 14:26:56 GMT
- Title: Faster Learning of Temporal Action Proposal via Sparse Multilevel
Boundary Generator
- Authors: Qing Song, Yang Zhou, Mengjie Hu, Chun Liu
- Abstract summary: Temporal action localization in videos presents significant challenges in the field of computer vision.
We propose a novel framework, Sparse Multilevel Boundary Generator (SMBG), which enhances the boundary-sensitive method with boundary classification and action completeness regression.
Our method is evaluated on two popular benchmarks, ActivityNet-1.3 and THUMOS14, and is shown to achieve state-of-the-art performance, with a better inference speed (2.47xBSN++, 2.12xDBG)
- Score: 9.038216757761955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal action localization in videos presents significant challenges in the
field of computer vision. While the boundary-sensitive method has been widely
adopted, its limitations include incomplete use of intermediate and global
information, as well as an inefficient proposal feature generator. To address
these challenges, we propose a novel framework, Sparse Multilevel Boundary
Generator (SMBG), which enhances the boundary-sensitive method with boundary
classification and action completeness regression. SMBG features a multi-level
boundary module that enables faster processing by gathering boundary
information at different lengths. Additionally, we introduce a sparse
extraction confidence head that distinguishes information inside and outside
the action, further optimizing the proposal feature generator. To improve the
synergy between multiple branches and balance positive and negative samples, we
propose a global guidance loss. Our method is evaluated on two popular
benchmarks, ActivityNet-1.3 and THUMOS14, and is shown to achieve
state-of-the-art performance, with a better inference speed (2.47xBSN++,
2.12xDBG). These results demonstrate that SMBG provides a more efficient and
simple solution for generating temporal action proposals. Our proposed
framework has the potential to advance the field of computer vision and enhance
the accuracy and speed of temporal action localization in video analysis.The
code and models are made available at
\url{https://github.com/zhouyang-001/SMBG-for-temporal-action-proposal}.
Related papers
- Estimation of Reliable Proposal Quality for Temporal Action Detection [71.5989469643732]
We propose a new method that gives insights into moment and region perspectives simultaneously to align the two tasks by acquiring reliable proposal quality.
For the moment perspective, Boundary Evaluate Module (BEM) is designed which focuses on local appearance and motion evolvement to estimate boundary quality.
For the region perspective, we introduce Region Evaluate Module (REM) which uses a new and efficient sampling method for proposal feature representation.
arXiv Detail & Related papers (2022-04-25T14:33:49Z) - Towards High-Quality Temporal Action Detection with Sparse Proposals [14.923321325749196]
Temporal Action Detection aims to localize the temporal segments containing human action instances and predict the action categories.
We introduce Sparse Proposals to interact with the hierarchical features.
Experiments demonstrate the effectiveness of our method, especially under high tIoU thresholds.
arXiv Detail & Related papers (2021-09-18T06:15:19Z) - Augmented Transformer with Adaptive Graph for Temporal Action Proposal
Generation [79.98992138865042]
We present an augmented transformer with adaptive graph network (ATAG) to exploit both long-range and local temporal contexts for TAPG.
Specifically, we enhance the vanilla transformer by equipping a snippet actionness loss and a front block, dubbed augmented transformer.
An adaptive graph convolutional network (GCN) is proposed to build local temporal context by mining the position information and difference between adjacent features.
arXiv Detail & Related papers (2021-03-30T02:01:03Z) - Temporal Context Aggregation Network for Temporal Action Proposal
Refinement [93.03730692520999]
Temporal action proposal generation is a challenging yet important task in the video understanding field.
Current methods still suffer from inaccurate temporal boundaries and inferior confidence used for retrieval.
We propose TCANet to generate high-quality action proposals through "local and global" temporal context aggregation.
arXiv Detail & Related papers (2021-03-24T12:34:49Z) - Learning Salient Boundary Feature for Anchor-free Temporal Action
Localization [81.55295042558409]
Temporal action localization is an important yet challenging task in video understanding.
We propose the first purely anchor-free temporal localization method.
Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module, and (iii) several consistency constraints.
arXiv Detail & Related papers (2021-03-24T12:28:32Z) - Relaxed Transformer Decoders for Direct Action Proposal Generation [30.516462193231888]
This paper presents a simple and end-to-end learnable framework (RTD-Net) for direct action proposal generation.
To tackle the essential visual difference between time and space, we make three important improvements over the original transformer detection framework (DETR)
Experiments on THUMOS14 and ActivityNet-1.3 benchmarks demonstrate the effectiveness of RTD-Net.
arXiv Detail & Related papers (2021-02-03T06:29:28Z) - BSN++: Complementary Boundary Regressor with Scale-Balanced Relation
Modeling for Temporal Action Proposal Generation [85.13713217986738]
We present BSN++, a new framework which exploits complementary boundary regressor and relation modeling for temporal proposal generation.
Not surprisingly, the proposed BSN++ ranked 1st place in the CVPR19 - ActivityNet challenge leaderboard on temporal action localization task.
arXiv Detail & Related papers (2020-09-15T07:08:59Z) - Complementary Boundary Generator with Scale-Invariant Relation Modeling
for Temporal Action Localization: Submission to ActivityNet Challenge 2020 [66.4527310659592]
This report presents an overview of our solution used in the submission to ActivityNet Challenge 2020 Task 1.
We decouple the temporal action localization task into two stages (i.e. proposal generation and classification) and enrich the proposal diversity.
Our proposed scheme achieves the state-of-the-art performance on the temporal action localization task with textbf42.26 average mAP on the challenge testing set.
arXiv Detail & Related papers (2020-07-20T04:35:40Z) - Accurate Temporal Action Proposal Generation with Relation-Aware Pyramid
Network [29.7640925776191]
We propose a Relation-aware pyramid Network (RapNet) to generate highly accurate temporal action proposals.
In RapNet, a novel relation-aware module is introduced to exploit bi-directional long-range relations between local features for context distilling.
arXiv Detail & Related papers (2020-03-09T13:47:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.