CLTA: Contents and Length-based Temporal Attention for Few-shot Action
Recognition
- URL: http://arxiv.org/abs/2103.10567v1
- Date: Thu, 18 Mar 2021 23:40:28 GMT
- Title: CLTA: Contents and Length-based Temporal Attention for Few-shot Action
Recognition
- Authors: Yang Bo, Yangdi Lu and Wenbo He
- Abstract summary: We propose a Contents and Length-based Temporal Attention model, which learns customized temporal attention for the individual video.
We show that even a not fine-tuned backbone with an ordinary softmax classifier can still achieve similar or better results compared to the state-of-the-art few-shot action recognition.
- Score: 2.0349696181833337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot action recognition has attracted increasing attention due to the
difficulty in acquiring the properly labelled training samples. Current works
have shown that preserving spatial information and comparing video descriptors
are crucial for few-shot action recognition. However, the importance of
preserving temporal information is not well discussed. In this paper, we
propose a Contents and Length-based Temporal Attention (CLTA) model, which
learns customized temporal attention for the individual video to tackle the
few-shot action recognition problem. CLTA utilizes the Gaussian likelihood
function as the template to generate temporal attention and trains the learning
matrices to study the mean and standard deviation based on both frame contents
and length. We show that even a not fine-tuned backbone with an ordinary
softmax classifier can still achieve similar or better results compared to the
state-of-the-art few-shot action recognition with precisely captured temporal
attention.
Related papers
- AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference [51.1972443343829]
We propose AttentionPredictor, which is the first learning-based critical token identification approach.
AttentionPredictor accurately predicts the attention score while consuming negligible memory.
We also propose a cross-token critical cache prefetching framework that hides the token time overhead to accelerate the decoding stage.
arXiv Detail & Related papers (2025-02-06T13:41:46Z) - Exploring Temporally-Aware Features for Point Tracking [58.63091479730935]
Chrono is a feature backbone specifically designed for point tracking with built-in temporal awareness.
Chrono achieves state-of-the-art performance in a refiner-free setting on the TAP-Vid-DAVIS and TAP-Vid-Kinetics datasets.
arXiv Detail & Related papers (2025-01-21T15:39:40Z) - CSTA: Spatial-Temporal Causal Adaptive Learning for Exemplar-Free Video Class-Incremental Learning [62.69917996026769]
A class-incremental learning task requires learning and preserving both spatial appearance and temporal action involvement.
We propose a framework that equips separate adapters to learn new class patterns, accommodating the incremental information requirements unique to each class.
A causal compensation mechanism is proposed to reduce the conflicts during increment and memorization for between different types of information.
arXiv Detail & Related papers (2025-01-13T11:34:55Z) - On the Importance of Spatial Relations for Few-shot Action Recognition [109.2312001355221]
In this paper, we investigate the importance of spatial relations and propose a more accurate few-shot action recognition method.
A novel Spatial Alignment Cross Transformer (SA-CT) learns to re-adjust the spatial relations and incorporates the temporal information.
Experiments reveal that, even without using any temporal information, the performance of SA-CT is comparable to temporal based methods on 3/4 benchmarks.
arXiv Detail & Related papers (2023-08-14T12:58:02Z) - Zero-shot Skeleton-based Action Recognition via Mutual Information
Estimation and Maximization [26.721082316870532]
Zero-shot skeleton-based action recognition aims to recognize actions of unseen categories after training on data of seen categories.
We propose a new zero-shot skeleton-based action recognition method via mutual information (MI) estimation and estimation.
arXiv Detail & Related papers (2023-08-07T23:41:55Z) - Class-Incremental Learning for Action Recognition in Videos [44.923719189467164]
We tackle catastrophic forgetting problem in the context of class-incremental learning for video recognition.
Our framework addresses this challenging task by introducing time-channel importance maps and exploiting the importance maps for learning the representations of incoming examples.
We evaluate the proposed approach on brand-new splits of class-incremental action recognition benchmarks constructed upon the UCF101, HMDB51, and Something-Something V2 datasets.
arXiv Detail & Related papers (2022-03-25T12:15:49Z) - Stacked Temporal Attention: Improving First-person Action Recognition by
Emphasizing Discriminative Clips [39.29955809641396]
Many backgrounds or noisy frames in a first-person video can distract an action recognition model during its learning process.
Previous works explored to address this problem by applying temporal attention but failed to consider the global context of the full video.
We propose a simple yet effective Stacked Temporal Attention Module (STAM) to compute temporal attention based on the global knowledge across clips.
arXiv Detail & Related papers (2021-12-02T08:02:35Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - Few-shot Action Recognition with Permutation-invariant Attention [169.61294360056925]
We build on a C3D encoder for encoded video blocks to capture short-range action patterns.
We exploit spatial and temporal attention modules and naturalistic self-supervision.
Our method outperforms the state of the art on the HMDB51, UCF101, miniMIT datasets.
arXiv Detail & Related papers (2020-01-12T10:58:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.