Learning Salient Boundary Feature for Anchor-free Temporal Action
Localization
- URL: http://arxiv.org/abs/2103.13137v1
- Date: Wed, 24 Mar 2021 12:28:32 GMT
- Title: Learning Salient Boundary Feature for Anchor-free Temporal Action
Localization
- Authors: Chuming Lin, Chengming Xu, Donghao Luo, Yabiao Wang, Ying Tai,
Chengjie Wang, Jilin Li, Feiyue Huang, Yanwei Fu
- Abstract summary: Temporal action localization is an important yet challenging task in video understanding.
We propose the first purely anchor-free temporal localization method.
Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module, and (iii) several consistency constraints.
- Score: 81.55295042558409
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Temporal action localization is an important yet challenging task in video
understanding. Typically, such a task aims at inferring both the action
category and localization of the start and end frame for each action instance
in a long, untrimmed video.While most current models achieve good results by
using pre-defined anchors and numerous actionness, such methods could be
bothered with both large number of outputs and heavy tuning of locations and
sizes corresponding to different anchors. Instead, anchor-free methods is
lighter, getting rid of redundant hyper-parameters, but gains few attention. In
this paper, we propose the first purely anchor-free temporal localization
method, which is both efficient and effective. Our model includes (i) an
end-to-end trainable basic predictor, (ii) a saliency-based refinement module
to gather more valuable boundary features for each proposal with a novel
boundary pooling, and (iii) several consistency constraints to make sure our
model can find the accurate boundary given arbitrary proposals. Extensive
experiments show that our method beats all anchor-based and actionness-guided
methods with a remarkable margin on THUMOS14, achieving state-of-the-art
results, and comparable ones on ActivityNet v1.3. Code is available at
https://github.com/TencentYoutuResearch/ActionDetection-AFSD.
Related papers
- FMI-TAL: Few-shot Multiple Instances Temporal Action Localization by Probability Distribution Learning and Interval Cluster Refinement [2.261014973523156]
We propose a novel solution involving a spatial-channel relation transformer with probability learning and cluster refinement.
This method can accurately identify the start and end boundaries of actions in the query video.
Our model achieves competitive performance through meticulous experimentation utilizing the benchmark datasets ActivityNet1.3 and THUMOS14.
arXiv Detail & Related papers (2024-08-25T08:17:25Z) - Proposal-based Temporal Action Localization with Point-level Supervision [29.98225940694062]
Point-level supervised temporal action localization (PTAL) aims at recognizing and localizing actions in untrimmed videos.
We propose a novel method that localizes actions by generating and evaluating action proposals of flexible duration.
Experiments show that our proposed method achieves competitive or superior performance to the state-of-the-art methods.
arXiv Detail & Related papers (2023-10-09T08:27:05Z) - Temporal Action Localization with Enhanced Instant Discriminability [66.76095239972094]
Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.
We propose a one-stage framework named TriDet to resolve imprecise predictions of action boundaries by existing methods.
Experimental results demonstrate the robustness of TriDet and its state-of-the-art performance on multiple TAD datasets.
arXiv Detail & Related papers (2023-09-11T16:17:50Z) - DIR-AS: Decoupling Individual Identification and Temporal Reasoning for
Action Segmentation [84.78383981697377]
Fully supervised action segmentation works on frame-wise action recognition with dense annotations and often suffers from the over-segmentation issue.
We develop a novel local-global attention mechanism with temporal pyramid dilation and temporal pyramid pooling for efficient multi-scale attention.
We achieve state-of-the-art accuracy, eg, 82.8% (+2.6%) on GTEA and 74.7% (+1.2%) on Breakfast, which demonstrates the effectiveness of our proposed method.
arXiv Detail & Related papers (2023-04-04T20:27:18Z) - Diffusion Action Segmentation [63.061058214427085]
We propose a novel framework via denoising diffusion models, which shares the same inherent spirit of such iterative refinement.
In this framework, action predictions are iteratively generated from random noise with input video features as conditions.
arXiv Detail & Related papers (2023-03-31T10:53:24Z) - Towards High-Quality Temporal Action Detection with Sparse Proposals [14.923321325749196]
Temporal Action Detection aims to localize the temporal segments containing human action instances and predict the action categories.
We introduce Sparse Proposals to interact with the hierarchical features.
Experiments demonstrate the effectiveness of our method, especially under high tIoU thresholds.
arXiv Detail & Related papers (2021-09-18T06:15:19Z) - EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content.
First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events.
Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z) - A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action
Localization [12.353250130848044]
We present a novel framework named HAM-Net with a hybrid attention mechanism which includes temporal soft, semi-soft and hard attentions.
Our proposed approach outperforms recent state-of-the-art methods by at least 2.2% mAP at IoU threshold 0.5 on the THUMOS14 dataset.
arXiv Detail & Related papers (2021-01-03T03:08:18Z) - Revisiting Anchor Mechanisms for Temporal Action Localization [126.96340233561418]
This paper proposes a novel anchor-free action localization module that assists action localization by temporal points.
By combining the proposed anchor-free module with a conventional anchor-based module, we propose a novel action localization framework, called A2Net.
The cooperation between anchor-free and anchor-based modules achieves superior performance to the state-of-the-art on THUMOS14.
arXiv Detail & Related papers (2020-08-22T13:39:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.