Vision Transformers for Action Recognition: A Survey
- URL: http://arxiv.org/abs/2209.05700v1
- Date: Tue, 13 Sep 2022 02:57:05 GMT
- Title: Vision Transformers for Action Recognition: A Survey
- Authors: Anwaar Ulhaq, Naveed Akhtar, Ganna Pogrebna and Ajmal Mian
- Abstract summary: Vision transformers are emerging as a powerful tool to solve computer vision problems.
Recent techniques have proven the efficacy of transformers beyond the image domain to solve numerous video-related tasks.
Human action recognition is receiving special attention from the research community due to its widespread applications.
- Score: 41.69370782177517
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Vision transformers are emerging as a powerful tool to solve computer vision
problems. Recent techniques have also proven the efficacy of transformers
beyond the image domain to solve numerous video-related tasks. Among those,
human action recognition is receiving special attention from the research
community due to its widespread applications. This article provides the first
comprehensive survey of vision transformer techniques for action recognition.
We analyze and summarize the existing and emerging literature in this direction
while highlighting the popular trends in adapting transformers for action
recognition. Due to their specialized application, we collectively refer to
these methods as ``action transformers''. Our literature review provides
suitable taxonomies for action transformers based on their architecture,
modality, and intended objective. Within the context of action transformers, we
explore the techniques to encode spatio-temporal data, dimensionality
reduction, frame patch and spatio-temporal cube construction, and various
representation methods. We also investigate the optimization of spatio-temporal
attention in transformer layers to handle longer sequences, typically by
reducing the number of tokens in a single attention operation. Moreover, we
also investigate different network learning strategies, such as self-supervised
and zero-shot learning, along with their associated losses for
transformer-based action recognition. This survey also summarizes the progress
towards gaining grounds on evaluation metric scores on important benchmarks
with action transformers. Finally, it provides a discussion on the challenges,
outlook, and future avenues for this research direction.
Related papers
- Transformers in Reinforcement Learning: A Survey [7.622978576824539]
Transformers have impacted domains like natural language processing, computer vision, and robotics, where they improve performance compared to other neural networks.
This survey explores how transformers are used in reinforcement learning (RL), where they are seen as a promising solution for addressing challenges such as unstable training, credit assignment, lack of interpretability, and partial observability.
arXiv Detail & Related papers (2023-07-12T07:51:12Z) - Object Detection with Transformers: A Review [11.255962936937744]
This paper provides a comprehensive review of 21 recently proposed advancements in the original DETR model.
We conduct a comparative analysis across various detection transformers, evaluating their performance and network architectures.
We hope that this study will ignite further interest among researchers in addressing the existing challenges and exploring the application of transformers in the object detection domain.
arXiv Detail & Related papers (2023-06-07T16:13:38Z) - Advances in Medical Image Analysis with Vision Transformers: A
Comprehensive Review [6.953789750981636]
We provide an encyclopedic review of the applications of Transformers in medical imaging.
Specifically, we present a systematic and thorough review of relevant recent Transformer literature for different medical image analysis tasks.
arXiv Detail & Related papers (2023-01-09T16:56:23Z) - Learning Explicit Object-Centric Representations with Vision
Transformers [81.38804205212425]
We build on the self-supervision task of masked autoencoding and explore its effectiveness for learning object-centric representations with transformers.
We show that the model efficiently learns to decompose simple scenes as measured by segmentation metrics on several multi-object benchmarks.
arXiv Detail & Related papers (2022-10-25T16:39:49Z) - 3D Vision with Transformers: A Survey [114.86385193388439]
The success of the transformer architecture in natural language processing has triggered attention in the computer vision field.
We present a systematic and thorough review of more than 100 transformers methods for different 3D vision tasks.
We discuss transformer design in 3D vision, which allows it to process data with various 3D representations.
arXiv Detail & Related papers (2022-08-08T17:59:11Z) - Blending Anti-Aliasing into Vision Transformer [57.88274087198552]
discontinuous patch-wise tokenization process implicitly introduces jagged artifacts into attention maps.
Aliasing effect occurs when discrete patterns are used to produce high frequency or continuous information, resulting in the indistinguishable distortions.
We propose a plug-and-play Aliasing-Reduction Module(ARM) to alleviate the aforementioned issue.
arXiv Detail & Related papers (2021-10-28T14:30:02Z) - Transformers in Vision: A Survey [101.07348618962111]
Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence.
Transformers require minimal inductive biases for their design and are naturally suited as set-functions.
This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline.
arXiv Detail & Related papers (2021-01-04T18:57:24Z) - A Survey on Visual Transformer [126.56860258176324]
Transformer is a type of deep neural network mainly based on the self-attention mechanism.
In this paper, we review these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages.
arXiv Detail & Related papers (2020-12-23T09:37:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.