On the Importance of Spatial Relations for Few-shot Action Recognition
- URL: http://arxiv.org/abs/2308.07119v1
- Date: Mon, 14 Aug 2023 12:58:02 GMT
- Title: On the Importance of Spatial Relations for Few-shot Action Recognition
- Authors: Yilun Zhang, Yuqian Fu, Xingjun Ma, Lizhe Qi, Jingjing Chen, Zuxuan
Wu, Yu-Gang Jiang
- Abstract summary: In this paper, we investigate the importance of spatial relations and propose a more accurate few-shot action recognition method.
A novel Spatial Alignment Cross Transformer (SA-CT) learns to re-adjust the spatial relations and incorporates the temporal information.
Experiments reveal that, even without using any temporal information, the performance of SA-CT is comparable to temporal based methods on 3/4 benchmarks.
- Score: 109.2312001355221
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning has achieved great success in video recognition, yet still
struggles to recognize novel actions when faced with only a few examples. To
tackle this challenge, few-shot action recognition methods have been proposed
to transfer knowledge from a source dataset to a novel target dataset with only
one or a few labeled videos. However, existing methods mainly focus on modeling
the temporal relations between the query and support videos while ignoring the
spatial relations. In this paper, we find that the spatial misalignment between
objects also occurs in videos, notably more common than the temporal
inconsistency. We are thus motivated to investigate the importance of spatial
relations and propose a more accurate few-shot action recognition method that
leverages both spatial and temporal information. Particularly, a novel Spatial
Alignment Cross Transformer (SA-CT) which learns to re-adjust the spatial
relations and incorporates the temporal information is contributed. Experiments
reveal that, even without using any temporal information, the performance of
SA-CT is comparable to temporal based methods on 3/4 benchmarks. To further
incorporate the temporal information, we propose a simple yet effective
Temporal Mixer module. The Temporal Mixer enhances the video representation and
improves the performance of the full SA-CT model, achieving very competitive
results. In this work, we also exploit large-scale pretrained models for
few-shot action recognition, providing useful insights for this research
direction.
Related papers
- CAST: Cross-Attention in Space and Time for Video Action Recognition [8.785207228156098]
We propose a novel two-stream architecture called Cross-Attention in Space and Time (CAST)
CAST achieves a balanced spatial-temporal understanding of videos using only balanced input.
Our proposed mechanism enables spatial and temporal expert models to exchange information and make synergistic predictions.
arXiv Detail & Related papers (2023-11-30T18:58:51Z) - Spatio-Temporal Branching for Motion Prediction using Motion Increments [55.68088298632865]
Human motion prediction (HMP) has emerged as a popular research topic due to its diverse applications.
Traditional methods rely on hand-crafted features and machine learning techniques.
We propose a noveltemporal-temporal branching network using incremental information for HMP.
arXiv Detail & Related papers (2023-08-02T12:04:28Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Leaping Into Memories: Space-Time Deep Feature Synthesis [93.10032043225362]
We propose LEAPS, an architecture-independent method for synthesizing videos from internal models.
We quantitatively and qualitatively evaluate the applicability of LEAPS by inverting a range of architectures convolutional attention-based on Kinetics-400.
arXiv Detail & Related papers (2023-03-17T12:55:22Z) - STAU: A SpatioTemporal-Aware Unit for Video Prediction and Beyond [78.129039340528]
We propose a temporal-aware unit (STAU) for video prediction and beyond.
Our STAU can outperform other methods on all tasks in terms of performance and efficiency.
arXiv Detail & Related papers (2022-04-20T13:42:51Z) - Spatio-Temporal Context for Action Detection [2.294635424666456]
This work proposes to use non-aggregated temporal information.
The main contribution is the introduction of two cross attention blocks.
Experiments on the AVA dataset show the advantages of the proposed approach.
arXiv Detail & Related papers (2021-06-29T08:33:48Z) - SSAN: Separable Self-Attention Network for Video Representation Learning [11.542048296046524]
We propose a separable self-attention (SSA) module, which models spatial and temporal correlations sequentially.
By adding SSA module into 2D CNN, we build a SSA network (SSAN) for video representation learning.
Our approach outperforms state-of-the-art methods on Something-Something and Kinetics-400 datasets.
arXiv Detail & Related papers (2021-05-27T10:02:04Z) - CLTA: Contents and Length-based Temporal Attention for Few-shot Action
Recognition [2.0349696181833337]
We propose a Contents and Length-based Temporal Attention model, which learns customized temporal attention for the individual video.
We show that even a not fine-tuned backbone with an ordinary softmax classifier can still achieve similar or better results compared to the state-of-the-art few-shot action recognition.
arXiv Detail & Related papers (2021-03-18T23:40:28Z) - One-shot Learning for Temporal Knowledge Graphs [49.41854171118697]
We propose a one-shot learning framework for link prediction in temporal knowledge graphs.
Our proposed method employs a self-attention mechanism to effectively encode temporal interactions between entities.
Our experiments show that the proposed algorithm outperforms the state of the art baselines for two well-studied benchmarks.
arXiv Detail & Related papers (2020-10-23T03:24:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.