Temporal Context Aggregation Network for Temporal Action Proposal
Refinement
- URL: http://arxiv.org/abs/2103.13141v1
- Date: Wed, 24 Mar 2021 12:34:49 GMT
- Title: Temporal Context Aggregation Network for Temporal Action Proposal
Refinement
- Authors: Zhiwu Qing, Haisheng Su, Weihao Gan, Dongliang Wang, Wei Wu, Xiang
Wang, Yu Qiao, Junjie Yan, Changxin Gao, Nong Sang
- Abstract summary: Temporal action proposal generation is a challenging yet important task in the video understanding field.
Current methods still suffer from inaccurate temporal boundaries and inferior confidence used for retrieval.
We propose TCANet to generate high-quality action proposals through "local and global" temporal context aggregation.
- Score: 93.03730692520999
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal action proposal generation aims to estimate temporal intervals of
actions in untrimmed videos, which is a challenging yet important task in the
video understanding field. The proposals generated by current methods still
suffer from inaccurate temporal boundaries and inferior confidence used for
retrieval owing to the lack of efficient temporal modeling and effective
boundary context utilization. In this paper, we propose Temporal Context
Aggregation Network (TCANet) to generate high-quality action proposals through
"local and global" temporal context aggregation and complementary as well as
progressive boundary refinement. Specifically, we first design a Local-Global
Temporal Encoder (LGTE), which adopts the channel grouping strategy to
efficiently encode both "local and global" temporal inter-dependencies.
Furthermore, both the boundary and internal context of proposals are adopted
for frame-level and segment-level boundary regressions, respectively. Temporal
Boundary Regressor (TBR) is designed to combine these two regression
granularities in an end-to-end fashion, which achieves the precise boundaries
and reliable confidence of proposals through progressive refinement. Extensive
experiments are conducted on three challenging datasets: HACS,
ActivityNet-v1.3, and THUMOS-14, where TCANet can generate proposals with high
precision and recall. By combining with the existing action classifier, TCANet
can obtain remarkable temporal action detection performance compared with other
methods. Not surprisingly, the proposed TCANet won the 1$^{st}$ place in the
CVPR 2020 - HACS challenge leaderboard on temporal action localization task.
Related papers
- Faster Learning of Temporal Action Proposal via Sparse Multilevel
Boundary Generator [9.038216757761955]
Temporal action localization in videos presents significant challenges in the field of computer vision.
We propose a novel framework, Sparse Multilevel Boundary Generator (SMBG), which enhances the boundary-sensitive method with boundary classification and action completeness regression.
Our method is evaluated on two popular benchmarks, ActivityNet-1.3 and THUMOS14, and is shown to achieve state-of-the-art performance, with a better inference speed (2.47xBSN++, 2.12xDBG)
arXiv Detail & Related papers (2023-03-06T14:26:56Z) - DCAN: Improving Temporal Action Detection via Dual Context Aggregation [29.46851768470807]
Temporal action detection aims to locate the boundaries of action in the video.
The current method based on boundary matching enumerates and calculates all possible boundary matchings to generate proposals.
We propose Dual Context Aggregation Network (DCAN) to aggregate context on two levels, namely, boundary level and proposal level.
arXiv Detail & Related papers (2021-12-07T10:14:26Z) - Augmented Transformer with Adaptive Graph for Temporal Action Proposal
Generation [79.98992138865042]
We present an augmented transformer with adaptive graph network (ATAG) to exploit both long-range and local temporal contexts for TAPG.
Specifically, we enhance the vanilla transformer by equipping a snippet actionness loss and a front block, dubbed augmented transformer.
An adaptive graph convolutional network (GCN) is proposed to build local temporal context by mining the position information and difference between adjacent features.
arXiv Detail & Related papers (2021-03-30T02:01:03Z) - Learning Salient Boundary Feature for Anchor-free Temporal Action
Localization [81.55295042558409]
Temporal action localization is an important yet challenging task in video understanding.
We propose the first purely anchor-free temporal localization method.
Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module, and (iii) several consistency constraints.
arXiv Detail & Related papers (2021-03-24T12:28:32Z) - Relaxed Transformer Decoders for Direct Action Proposal Generation [30.516462193231888]
This paper presents a simple and end-to-end learnable framework (RTD-Net) for direct action proposal generation.
To tackle the essential visual difference between time and space, we make three important improvements over the original transformer detection framework (DETR)
Experiments on THUMOS14 and ActivityNet-1.3 benchmarks demonstrate the effectiveness of RTD-Net.
arXiv Detail & Related papers (2021-02-03T06:29:28Z) - Two-Stream Consensus Network for Weakly-Supervised Temporal Action
Localization [94.37084866660238]
We present a Two-Stream Consensus Network (TSCN) to simultaneously address these challenges.
The proposed TSCN features an iterative refinement training method, where a frame-level pseudo ground truth is iteratively updated.
We propose a new attention normalization loss to encourage the predicted attention to act like a binary selection, and promote the precise localization of action instance boundaries.
arXiv Detail & Related papers (2020-10-22T10:53:32Z) - BSN++: Complementary Boundary Regressor with Scale-Balanced Relation
Modeling for Temporal Action Proposal Generation [85.13713217986738]
We present BSN++, a new framework which exploits complementary boundary regressor and relation modeling for temporal proposal generation.
Not surprisingly, the proposed BSN++ ranked 1st place in the CVPR19 - ActivityNet challenge leaderboard on temporal action localization task.
arXiv Detail & Related papers (2020-09-15T07:08:59Z) - Complementary Boundary Generator with Scale-Invariant Relation Modeling
for Temporal Action Localization: Submission to ActivityNet Challenge 2020 [66.4527310659592]
This report presents an overview of our solution used in the submission to ActivityNet Challenge 2020 Task 1.
We decouple the temporal action localization task into two stages (i.e. proposal generation and classification) and enrich the proposal diversity.
Our proposed scheme achieves the state-of-the-art performance on the temporal action localization task with textbf42.26 average mAP on the challenge testing set.
arXiv Detail & Related papers (2020-07-20T04:35:40Z) - Accurate Temporal Action Proposal Generation with Relation-Aware Pyramid
Network [29.7640925776191]
We propose a Relation-aware pyramid Network (RapNet) to generate highly accurate temporal action proposals.
In RapNet, a novel relation-aware module is introduced to exploit bi-directional long-range relations between local features for context distilling.
arXiv Detail & Related papers (2020-03-09T13:47:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.