LSTC: Boosting Atomic Action Detection with Long-Short-Term Context
- URL: http://arxiv.org/abs/2110.09819v1
- Date: Tue, 19 Oct 2021 10:09:09 GMT
- Title: LSTC: Boosting Atomic Action Detection with Long-Short-Term Context
- Authors: Yuxi Li, Boshen Zhang, Jian Li, Yabiao Wang, Weiyao Lin, Chengjie
Wang, Jilin Li, Feiyue Huang
- Abstract summary: We decompose the action recognition pipeline into short-term and long-term reliance.
Within our design, a local aggregation branch is utilized to gather dense and informative short-term cues.
Both branches independently predict the context-specific actions and the results are merged in the end.
- Score: 60.60267767456306
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we place the atomic action detection problem into a Long-Short
Term Context (LSTC) to analyze how the temporal reliance among video signals
affect the action detection results. To do this, we decompose the action
recognition pipeline into short-term and long-term reliance, in terms of the
hypothesis that the two kinds of context are conditionally independent given
the objective action instance. Within our design, a local aggregation branch is
utilized to gather dense and informative short-term cues, while a high order
long-term inference branch is designed to reason the objective action class
from high-order interaction between actor and other person or person pairs.
Both branches independently predict the context-specific actions and the
results are merged in the end. We demonstrate that both temporal grains are
beneficial to atomic action recognition. On the mainstream benchmarks of atomic
action detection, our design can bring significant performance gain from the
existing state-of-the-art pipeline. The code of this project can be found at
[this url](https://github.com/TencentYoutuResearch/ActionDetection-LSTC)
Related papers
- Introducing Gating and Context into Temporal Action Detection [0.8987776881291144]
Temporal Action Detection (TAD) remains challenging due to action overlaps and variable action durations.
Recent findings suggest that TAD performance is dependent on the structural design of transformers rather than on the self-attention mechanism.
We propose a refined feature extraction process through lightweight, yet effective operations.
arXiv Detail & Related papers (2024-09-06T11:52:42Z) - Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations.
We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - DIR-AS: Decoupling Individual Identification and Temporal Reasoning for
Action Segmentation [84.78383981697377]
Fully supervised action segmentation works on frame-wise action recognition with dense annotations and often suffers from the over-segmentation issue.
We develop a novel local-global attention mechanism with temporal pyramid dilation and temporal pyramid pooling for efficient multi-scale attention.
We achieve state-of-the-art accuracy, eg, 82.8% (+2.6%) on GTEA and 74.7% (+1.2%) on Breakfast, which demonstrates the effectiveness of our proposed method.
arXiv Detail & Related papers (2023-04-04T20:27:18Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Towards High-Quality Temporal Action Detection with Sparse Proposals [14.923321325749196]
Temporal Action Detection aims to localize the temporal segments containing human action instances and predict the action categories.
We introduce Sparse Proposals to interact with the hierarchical features.
Experiments demonstrate the effectiveness of our method, especially under high tIoU thresholds.
arXiv Detail & Related papers (2021-09-18T06:15:19Z) - Finding Action Tubes with a Sparse-to-Dense Framework [62.60742627484788]
We propose a framework that generates action tube proposals from video streams with a single forward pass in a sparse-to-dense manner.
We evaluate the efficacy of our model on the UCF101-24, JHMDB-21 and UCFSports benchmark datasets.
arXiv Detail & Related papers (2020-08-30T15:38:44Z) - Long Short-Term Relation Networks for Video Action Detection [155.13392337831166]
Long Short-Term Relation Networks (LSTR) are presented in this paper.
LSTR aggregates and propagates relation to augment features for video action detection.
Extensive experiments are conducted on four benchmark datasets.
arXiv Detail & Related papers (2020-03-31T10:02:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.