Revisiting Anchor Mechanisms for Temporal Action Localization
- URL: http://arxiv.org/abs/2008.09837v1
- Date: Sat, 22 Aug 2020 13:39:29 GMT
- Title: Revisiting Anchor Mechanisms for Temporal Action Localization
- Authors: Le Yang, Houwen Peng, Dingwen Zhang, Jianlong Fu, Junwei Han
- Abstract summary: This paper proposes a novel anchor-free action localization module that assists action localization by temporal points.
By combining the proposed anchor-free module with a conventional anchor-based module, we propose a novel action localization framework, called A2Net.
The cooperation between anchor-free and anchor-based modules achieves superior performance to the state-of-the-art on THUMOS14.
- Score: 126.96340233561418
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most of the current action localization methods follow an anchor-based
pipeline: depicting action instances by pre-defined anchors, learning to select
the anchors closest to the ground truth, and predicting the confidence of
anchors with refinements. Pre-defined anchors set prior about the location and
duration for action instances, which facilitates the localization for common
action instances but limits the flexibility for tackling action instances with
drastic varieties, especially for extremely short or extremely long ones. To
address this problem, this paper proposes a novel anchor-free action
localization module that assists action localization by temporal points.
Specifically, this module represents an action instance as a point with its
distances to the starting boundary and ending boundary, alleviating the
pre-defined anchor restrictions in terms of action localization and duration.
The proposed anchor-free module is capable of predicting the action instances
whose duration is either extremely short or extremely long. By combining the
proposed anchor-free module with a conventional anchor-based module, we propose
a novel action localization framework, called A2Net. The cooperation between
anchor-free and anchor-based modules achieves superior performance to the
state-of-the-art on THUMOS14 (45.5% vs. 42.8%). Furthermore, comprehensive
experiments demonstrate the complementarity between the anchor-free and the
anchor-based module, making A2Net simple but effective.
Related papers
- Boundary Discretization and Reliable Classification Network for Temporal Action Detection [39.17204328036531]
Temporal action detection aims to recognize the action category and determine each action instance's starting and ending time in untrimmed videos.
Mixed methods have achieved remarkable performance by seamlessly merging anchor-based and anchor-free approaches.
We propose a novel Boundary Discretization and Reliable Classification Network (BDRC-Net) that addresses the issues above by introducing boundary discretization and reliable classification modules.
arXiv Detail & Related papers (2023-10-10T08:14:24Z) - DIR-AS: Decoupling Individual Identification and Temporal Reasoning for
Action Segmentation [84.78383981697377]
Fully supervised action segmentation works on frame-wise action recognition with dense annotations and often suffers from the over-segmentation issue.
We develop a novel local-global attention mechanism with temporal pyramid dilation and temporal pyramid pooling for efficient multi-scale attention.
We achieve state-of-the-art accuracy, eg, 82.8% (+2.6%) on GTEA and 74.7% (+1.2%) on Breakfast, which demonstrates the effectiveness of our proposed method.
arXiv Detail & Related papers (2023-04-04T20:27:18Z) - Video Activity Localisation with Uncertainties in Temporal Boundary [74.7263952414899]
Methods for video activity localisation over time assume implicitly that activity temporal boundaries are determined and precise.
In unscripted natural videos, different activities transit smoothly, so that it is intrinsically ambiguous to determine in labelling precisely when an activity starts and ends over time.
We introduce Elastic Moment Bounding (EMB) to accommodate flexible and adaptive activity temporal boundaries.
arXiv Detail & Related papers (2022-06-26T16:45:56Z) - ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal
Action Localization [36.90693762365237]
Weakly-supervised temporal action localization aims to recognize and localize action segments in untrimmed videos given only video-level action labels for training.
We propose system, a novel WTAL framework that enables explicit, action-aware segment modeling beyond standard MIL-based methods.
Our framework entails three segment-centric components: (i) dynamic segment sampling for compensating the contribution of short actions; (ii) intra- and inter-segment attention for modeling action dynamics and capturing temporal dependencies; (iii) pseudo instance-level supervision for improving action boundary prediction.
arXiv Detail & Related papers (2022-03-29T01:59:26Z) - Cross-modal Consensus Network for Weakly Supervised Temporal Action
Localization [74.34699679568818]
Weakly supervised temporal action localization (WS-TAL) is a challenging task that aims to localize action instances in the given video with video-level categorical supervision.
We propose a cross-modal consensus network (CO2-Net) to tackle this problem.
arXiv Detail & Related papers (2021-07-27T04:21:01Z) - Learning Salient Boundary Feature for Anchor-free Temporal Action
Localization [81.55295042558409]
Temporal action localization is an important yet challenging task in video understanding.
We propose the first purely anchor-free temporal localization method.
Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module, and (iii) several consistency constraints.
arXiv Detail & Related papers (2021-03-24T12:28:32Z) - CFAD: Coarse-to-Fine Action Detector for Spatiotemporal Action
Localization [42.95186231216036]
We propose Coarse-to-Fine Action Detector (CFAD) for efficient action localization.
CFAD first estimates coarse tubes-temporal action tubes from video streams, and then refines location based on key timestamps.
arXiv Detail & Related papers (2020-08-19T08:47:50Z) - Scope Head for Accurate Localization in Object Detection [135.9979405835606]
We propose a novel detector coined as ScopeNet, which models anchors of each location as a mutually dependent relationship.
With our concise and effective design, the proposed ScopeNet achieves state-of-the-art results on COCO.
arXiv Detail & Related papers (2020-05-11T04:00:09Z) - Accurate Temporal Action Proposal Generation with Relation-Aware Pyramid
Network [29.7640925776191]
We propose a Relation-aware pyramid Network (RapNet) to generate highly accurate temporal action proposals.
In RapNet, a novel relation-aware module is introduced to exploit bi-directional long-range relations between local features for context distilling.
arXiv Detail & Related papers (2020-03-09T13:47:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.