SRF-Net: Selective Receptive Field Network for Anchor-Free Temporal
Action Detection
- URL: http://arxiv.org/abs/2106.15258v1
- Date: Tue, 29 Jun 2021 11:29:16 GMT
- Title: SRF-Net: Selective Receptive Field Network for Anchor-Free Temporal
Action Detection
- Authors: Ranyu Ning, Can Zhang, Yuexian Zou
- Abstract summary: Temporal action detection (TAD) is a challenging task which aims to temporally localize and recognize the human action in untrimmed videos.
Current mainstream one-stage TAD approaches localize and classify action proposals relying on pre-defined anchors.
A novel TAD model termed as Selective Receptive Field Network (SRF-Net) is developed.
- Score: 32.159784061961886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal action detection (TAD) is a challenging task which aims to
temporally localize and recognize the human action in untrimmed videos. Current
mainstream one-stage TAD approaches localize and classify action proposals
relying on pre-defined anchors, where the location and scale for action
instances are set by designers. Obviously, such an anchor-based TAD method
limits its generalization capability and will lead to performance degradation
when videos contain rich action variation. In this study, we explore to remove
the requirement of pre-defined anchors for TAD methods. A novel TAD model
termed as Selective Receptive Field Network (SRF-Net) is developed, in which
the location offsets and classification scores at each temporal location can be
directly estimated in the feature map and SRF-Net is trained in an end-to-end
manner. Innovatively, a building block called Selective Receptive Field
Convolution (SRFC) is dedicatedly designed which is able to adaptively adjust
its receptive field size according to multiple scales of input information at
each temporal location in the feature map. Extensive experiments are conducted
on the THUMOS14 dataset, and superior results are reported comparing to
state-of-the-art TAD approaches.
Related papers
- Temporal Action Localization with Enhanced Instant Discriminability [66.76095239972094]
Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.
We propose a one-stage framework named TriDet to resolve imprecise predictions of action boundaries by existing methods.
Experimental results demonstrate the robustness of TriDet and its state-of-the-art performance on multiple TAD datasets.
arXiv Detail & Related papers (2023-09-11T16:17:50Z) - Zero-Shot Temporal Action Detection via Vision-Language Prompting [134.26292288193298]
We propose a novel zero-Shot Temporal Action detection model via Vision-LanguagE prompting (STALE)
Our model significantly outperforms state-of-the-art alternatives.
Our model also yields superior results on supervised TAD over recent strong competitors.
arXiv Detail & Related papers (2022-07-17T13:59:46Z) - Semi-Supervised Temporal Action Detection with Proposal-Free Masking [134.26292288193298]
We propose a novel Semi-supervised Temporal action detection model based on PropOsal-free Temporal mask (SPOT)
SPOT outperforms state-of-the-art alternatives, often by a large margin.
arXiv Detail & Related papers (2022-07-14T16:58:47Z) - Transferable Knowledge-Based Multi-Granularity Aggregation Network for
Temporal Action Localization: Submission to ActivityNet Challenge 2021 [33.840281113206444]
This report presents an overview of our solution used in the submission to 2021 HACS Temporal Action localization Challenge.
We use Temporal Context Aggregation Network (TCANet) to generate high-quality action proposals.
We also adopt an additional module to transfer the knowledge from trimmed videos to untrimmed videos.
Our proposed scheme achieves 39.91 and 29.78 average mAP on the challenge testing set of supervised and weakly-supervised temporal action localization track respectively.
arXiv Detail & Related papers (2021-07-27T06:18:21Z) - Boundary-sensitive Pre-training for Temporal Localization in Videos [124.40788524169668]
We investigate model pre-training for temporal localization by introducing a novel boundary-sensitive pretext ( BSP) task.
With the synthesized boundaries, BSP can be simply conducted via classifying the boundary types.
Extensive experiments show that the proposed BSP is superior and complementary to the existing action classification based pre-training counterpart.
arXiv Detail & Related papers (2020-11-21T17:46:24Z) - Unsupervised Domain Adaptation for Spatio-Temporal Action Localization [69.12982544509427]
S-temporal action localization is an important problem in computer vision.
We propose an end-to-end unsupervised domain adaptation algorithm.
We show that significant performance gain can be achieved when spatial and temporal features are adapted separately or jointly.
arXiv Detail & Related papers (2020-10-19T04:25:10Z) - Scope Head for Accurate Localization in Object Detection [135.9979405835606]
We propose a novel detector coined as ScopeNet, which models anchors of each location as a mutually dependent relationship.
With our concise and effective design, the proposed ScopeNet achieves state-of-the-art results on COCO.
arXiv Detail & Related papers (2020-05-11T04:00:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.