Related papers: Faster-TAD: Towards Temporal Action Detection with Proposal Generation and Classification in a Unified Network

Faster-TAD: Towards Temporal Action Detection with Proposal Generation and Classification in a Unified Network

URL: http://arxiv.org/abs/2204.02674v1
Date: Wed, 6 Apr 2022 08:55:35 GMT
Title: Faster-TAD: Towards Temporal Action Detection with Proposal Generation and Classification in a Unified Network
Authors: Shimin Chen, Chen Chen, Wei Li, Xunqiang Tao, Yandong Guo
Abstract summary: Temporal action detection (TAD) aims to detect the semantic labels and boundaries of action instances in untrimmed videos. We propose a unified network for TAD, termed Faster-TAD, by re-purposing a Faster-RCNN like architecture.
Score: 13.03191060554677
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Temporal action detection (TAD) aims to detect the semantic labels and boundaries of action instances in untrimmed videos. Current mainstream approaches are multi-step solutions, which fall short in efficiency and flexibility. In this paper, we propose a unified network for TAD, termed Faster-TAD, by re-purposing a Faster-RCNN like architecture. To tackle the unique difficulty in TAD, we make important improvements over the original framework. We propose a new Context-Adaptive Proposal Module and an innovative Fake-Proposal Generation Block. What's more, we use atomic action features to improve the performance. Faster-TAD simplifies the pipeline of TAD and gets remarkable performance on lots of benchmarks, i.e., ActivityNet-1.3 (40.01% mAP), HACS Segments (38.39% mAP), SoccerNet-Action Spotting (54.09% mAP). It outperforms existing single-network detector by a large margin.

Related papers

Technical Report for ActivityNet Challenge 2022 -- Temporal Action Localization [20.268572246761895]
We propose to locate the temporal boundaries of each action and predict action class in untrimmed videos. Faster-TAD simplifies the pipeline of TAD and gets remarkable performance.
arXiv Detail & Related papers (2024-10-31T14:16:56Z)
Cascaded Temporal Updating Network for Efficient Video Super-Resolution [47.63267159007611]
Key components in recurrent-based VSR networks significantly impact model efficiency. We propose a cascaded temporal updating network (CTUN) for efficient VSR. CTUN achieves a favorable trade-off between efficiency and performance compared to existing methods.
arXiv Detail & Related papers (2024-08-26T12:59:32Z)
Temporal Action Detection with Global Segmentation Mask Learning [134.26292288193298]
Existing temporal action detection (TAD) methods rely on generating an overwhelmingly large number of proposals per video. We propose a proposal-free Temporal Action detection model with Global mask (TAGS) Our core idea is to learn a global segmentation mask of each action instance jointly at the full video length.
arXiv Detail & Related papers (2022-07-14T00:46:51Z)
ETAD: A Unified Framework for Efficient Temporal Action Detection [70.21104995731085]
Untrimmed video understanding such as temporal action detection (TAD) often suffers from the pain of huge demand for computing resources. We build a unified framework for efficient end-to-end temporal action detection (ETAD) ETAD achieves state-of-the-art performance on both THUMOS-14 and ActivityNet-1.3.
arXiv Detail & Related papers (2022-05-14T21:16:21Z)
Towards High-Quality Temporal Action Detection with Sparse Proposals [14.923321325749196]
Temporal Action Detection aims to localize the temporal segments containing human action instances and predict the action categories. We introduce Sparse Proposals to interact with the hierarchical features. Experiments demonstrate the effectiveness of our method, especially under high tIoU thresholds.
arXiv Detail & Related papers (2021-09-18T06:15:19Z)
End-to-end Temporal Action Detection with Transformer [86.80289146697788]
Temporal action detection (TAD) aims to determine the semantic label and the boundaries of every action instance in an untrimmed video. Here, we construct an end-to-end framework for TAD upon Transformer, termed textitTadTR. Our method achieves state-of-the-art performance on HACS Segments and THUMOS14 and competitive performance on ActivityNet-1.3.
arXiv Detail & Related papers (2021-06-18T17:58:34Z)
Learning Salient Boundary Feature for Anchor-free Temporal Action Localization [81.55295042558409]
Temporal action localization is an important yet challenging task in video understanding. We propose the first purely anchor-free temporal localization method. Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module, and (iii) several consistency constraints.
arXiv Detail & Related papers (2021-03-24T12:28:32Z)
Decoupled and Memory-Reinforced Networks: Towards Effective Feature Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network. There are two major challenges in the current one-step approaches. We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z)
Relaxed Transformer Decoders for Direct Action Proposal Generation [30.516462193231888]
This paper presents a simple and end-to-end learnable framework (RTD-Net) for direct action proposal generation. To tackle the essential visual difference between time and space, we make three important improvements over the original transformer detection framework (DETR) Experiments on THUMOS14 and ActivityNet-1.3 benchmarks demonstrate the effectiveness of RTD-Net.
arXiv Detail & Related papers (2021-02-03T06:29:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.