DIR-AS: Decoupling Individual Identification and Temporal Reasoning for
Action Segmentation
- URL: http://arxiv.org/abs/2304.02110v1
- Date: Tue, 4 Apr 2023 20:27:18 GMT
- Title: DIR-AS: Decoupling Individual Identification and Temporal Reasoning for
Action Segmentation
- Authors: Peiyao Wang, Haibin Ling
- Abstract summary: Fully supervised action segmentation works on frame-wise action recognition with dense annotations and often suffers from the over-segmentation issue.
We develop a novel local-global attention mechanism with temporal pyramid dilation and temporal pyramid pooling for efficient multi-scale attention.
We achieve state-of-the-art accuracy, eg, 82.8% (+2.6%) on GTEA and 74.7% (+1.2%) on Breakfast, which demonstrates the effectiveness of our proposed method.
- Score: 84.78383981697377
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fully supervised action segmentation works on frame-wise action recognition
with dense annotations and often suffers from the over-segmentation issue.
Existing works have proposed a variety of solutions such as boundary-aware
networks, multi-stage refinement, and temporal smoothness losses. However, most
of them take advantage of frame-wise supervision, which cannot effectively
tackle the evaluation metrics with different granularities. In this paper, for
the desirable large receptive field, we first develop a novel local-global
attention mechanism with temporal pyramid dilation and temporal pyramid pooling
for efficient multi-scale attention. Then we decouple two inherent goals in
action segmentation, ie, (1) individual identification solved by frame-wise
supervision, and (2) temporal reasoning tackled by action set prediction.
Afterward, an action alignment module fuses these different granularity
predictions, leading to more accurate and smoother action segmentation. We
achieve state-of-the-art accuracy, eg, 82.8% (+2.6%) on GTEA and 74.7% (+1.2%)
on Breakfast, which demonstrates the effectiveness of our proposed method,
accompanied by extensive ablation studies. The code will be made available
later.
Related papers
- Proposal-based Temporal Action Localization with Point-level Supervision [29.98225940694062]
Point-level supervised temporal action localization (PTAL) aims at recognizing and localizing actions in untrimmed videos.
We propose a novel method that localizes actions by generating and evaluating action proposals of flexible duration.
Experiments show that our proposed method achieves competitive or superior performance to the state-of-the-art methods.
arXiv Detail & Related papers (2023-10-09T08:27:05Z) - Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and
Contrastive Meta-Learning [51.03781020616402]
Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications.
We propose a few-shot fine-grained action recognition problem, aiming to recognize novel fine-grained actions with only few samples given for each class.
Although progress has been made in coarse-grained actions, existing few-shot recognition methods encounter two issues handling fine-grained actions.
arXiv Detail & Related papers (2021-08-15T02:21:01Z) - Learning Salient Boundary Feature for Anchor-free Temporal Action
Localization [81.55295042558409]
Temporal action localization is an important yet challenging task in video understanding.
We propose the first purely anchor-free temporal localization method.
Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module, and (iii) several consistency constraints.
arXiv Detail & Related papers (2021-03-24T12:28:32Z) - Finding Action Tubes with a Sparse-to-Dense Framework [62.60742627484788]
We propose a framework that generates action tube proposals from video streams with a single forward pass in a sparse-to-dense manner.
We evaluate the efficacy of our model on the UCF101-24, JHMDB-21 and UCFSports benchmark datasets.
arXiv Detail & Related papers (2020-08-30T15:38:44Z) - Alleviating Over-segmentation Errors by Detecting Action Boundaries [14.089070456051488]
We propose an effective framework for the temporal action segmentation task, namely an Action Segmentment Framework (ASRF)
Our framework outperforms state-of-the-art methods on three challenging datasets.
arXiv Detail & Related papers (2020-07-14T07:20:14Z) - MS-TCN++: Multi-Stage Temporal Convolutional Network for Action
Segmentation [87.16030562892537]
We propose a multi-stage architecture for the temporal action segmentation task.
The first stage generates an initial prediction that is refined by the next ones.
Our models achieve state-of-the-art results on three datasets.
arXiv Detail & Related papers (2020-06-16T14:50:47Z) - Bottom-Up Temporal Action Localization with Mutual Regularization [107.39785866001868]
State-of-the-art solutions for TAL involve evaluating the frame-level probabilities of three action-indicating phases.
We introduce two regularization terms to mutually regularize the learning procedure.
Experiments are performed on two popular TAL datasets, THUMOS14 and ActivityNet1.3.
arXiv Detail & Related papers (2020-02-18T03:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.