Temporal Action Localization Using Gated Recurrent Units
- URL: http://arxiv.org/abs/2108.03375v1
- Date: Sat, 7 Aug 2021 06:25:29 GMT
- Title: Temporal Action Localization Using Gated Recurrent Units
- Authors: Hassan Keshvari Khojasteh, Hoda Mohammadzade, Hamid Behroozi
- Abstract summary: We propose a new network based on Gated Recurrent Unit (GRU) and two novel post-processing ideas for TAL task.
Specifically, we propose a new design for the output layer of the GRU resulting in the so-called GRU-Splitted model.
We evaluate the performance of the proposed method compared to state-of-the-art methods.
- Score: 6.091096843566857
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Temporal Action Localization (TAL) task in which the aim is to predict the
start and end of each action and its class label has many applications in the
real world. But due to its complexity, researchers have not reached great
results compared to the action recognition task. The complexity is related to
predicting precise start and end times for different actions in any video. In
this paper, we propose a new network based on Gated Recurrent Unit (GRU) and
two novel post-processing ideas for TAL task. Specifically, we propose a new
design for the output layer of the GRU resulting in the so-called GRU-Splitted
model. Moreover, linear interpolation is used to generate the action proposals
with precise start and end times. Finally, to rank the generated proposals
appropriately, we use a Learn to Rank (LTR) approach. We evaluated the
performance of the proposed method on Thumos14 dataset. Results show the
superiority of the performance of the proposed method compared to
state-of-the-art. Especially in the mean Average Precision (mAP) metric at
Intersection over Union (IoU) 0.7, we get 27.52% which is 5.12% better than
that of state-of-the-art methods.
Related papers
- Towards Completeness: A Generalizable Action Proposal Generator for Zero-Shot Temporal Action Localization [31.82121743586165]
Generalizable Action Proposal generator (GAP) is built in a query-based architecture and trained with a proposal-level objective.
Based on this architecture, we propose an Action-aware Discrimination loss to enhance the category-agnostic dynamic information of actions.
Our experiments show that our GAP achieves state-of-the-art performance on two challenging ZSTAL benchmarks.
arXiv Detail & Related papers (2024-08-25T09:07:06Z) - Proposal-based Temporal Action Localization with Point-level Supervision [29.98225940694062]
Point-level supervised temporal action localization (PTAL) aims at recognizing and localizing actions in untrimmed videos.
We propose a novel method that localizes actions by generating and evaluating action proposals of flexible duration.
Experiments show that our proposed method achieves competitive or superior performance to the state-of-the-art methods.
arXiv Detail & Related papers (2023-10-09T08:27:05Z) - PoseRAC: Pose Saliency Transformer for Repetitive Action Counting [56.34379680390869]
We introduce Pose Saliency Representation, which efficiently represents each action using only two salient poses instead of redundant frames.
We also introduce PoseRAC, which is based on this representation and achieves state-of-the-art performance.
Our lightweight model is highly efficient, requiring only 20 minutes for training on a GPU, and infers nearly 10x faster compared to previous methods.
arXiv Detail & Related papers (2023-03-15T08:51:17Z) - Deep Active Ensemble Sampling For Image Classification [8.31483061185317]
Active learning frameworks aim to reduce the cost of data annotation by actively requesting the labeling for the most informative data points.
Some proposed approaches include uncertainty-based techniques, geometric methods, implicit combination of uncertainty-based and geometric approaches.
We present an innovative integration of recent progress in both uncertainty-based and geometric frameworks to enable an efficient exploration/exploitation trade-off in sample selection strategy.
Our framework provides two advantages: (1) accurate posterior estimation, and (2) tune-able trade-off between computational overhead and higher accuracy.
arXiv Detail & Related papers (2022-10-11T20:20:20Z) - Active Learning with Effective Scoring Functions for Semi-Supervised
Temporal Action Localization [15.031156121516211]
This paper focuses on a rarely investigated yet practical task named semi-supervised TAL.
We propose an effective active learning method, named AL-STAL.
Experiment results show that AL-STAL outperforms the existing competitors and achieves satisfying performance compared with fully-supervised learning.
arXiv Detail & Related papers (2022-08-31T13:39:38Z) - ReAct: Temporal Action Detection with Relational Queries [84.76646044604055]
This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries.
We first propose a relational attention mechanism in the decoder, which guides the attention among queries based on their relations.
Lastly, we propose to predict the localization quality of each action query at inference in order to distinguish high-quality queries.
arXiv Detail & Related papers (2022-07-14T17:46:37Z) - Temporal Action Detection with Global Segmentation Mask Learning [134.26292288193298]
Existing temporal action detection (TAD) methods rely on generating an overwhelmingly large number of proposals per video.
We propose a proposal-free Temporal Action detection model with Global mask (TAGS)
Our core idea is to learn a global segmentation mask of each action instance jointly at the full video length.
arXiv Detail & Related papers (2022-07-14T00:46:51Z) - ZARTS: On Zero-order Optimization for Neural Architecture Search [94.41017048659664]
Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency.
This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation.
In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
arXiv Detail & Related papers (2021-10-10T09:35:15Z) - Temporal Attention-Augmented Graph Convolutional Network for Efficient
Skeleton-Based Human Action Recognition [97.14064057840089]
Graphal networks (GCNs) have been very successful in modeling non-Euclidean data structures.
Most GCN-based action recognition methods use deep feed-forward networks with high computational complexity to process all skeletons in an action.
We propose a temporal attention module (TAM) for increasing the efficiency in skeleton-based action recognition.
arXiv Detail & Related papers (2020-10-23T08:01:55Z) - Complementary Boundary Generator with Scale-Invariant Relation Modeling
for Temporal Action Localization: Submission to ActivityNet Challenge 2020 [66.4527310659592]
This report presents an overview of our solution used in the submission to ActivityNet Challenge 2020 Task 1.
We decouple the temporal action localization task into two stages (i.e. proposal generation and classification) and enrich the proposal diversity.
Our proposed scheme achieves the state-of-the-art performance on the temporal action localization task with textbf42.26 average mAP on the challenge testing set.
arXiv Detail & Related papers (2020-07-20T04:35:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.