Related papers: RNNs, CNNs and Transformers in Human Action Recognition: A Survey and a Hybrid Model

RNNs, CNNs and Transformers in Human Action Recognition: A Survey and a Hybrid Model

URL: http://arxiv.org/abs/2407.06162v2
Date: Thu, 15 Aug 2024 08:59:38 GMT
Title: RNNs, CNNs and Transformers in Human Action Recognition: A Survey and a Hybrid Model
Authors: Khaled Alomar, Halil Ibrahim Aysel, Xiaohao Cai,
Abstract summary: Human Action Recognition (HAR) encompasses the task of monitoring human activities across various domains. Over the past decade, the field of HAR has witnessed substantial progress by leveraging Convolutional Neural Networks (CNNs) Recently, the domain of computer vision has witnessed the emergence of Vision Transformers (ViTs) as a potent solution.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Human Action Recognition (HAR) encompasses the task of monitoring human activities across various domains, including but not limited to medical, educational, entertainment, visual surveillance, video retrieval, and the identification of anomalous activities. Over the past decade, the field of HAR has witnessed substantial progress by leveraging Convolutional Neural Networks (CNNs) to effectively extract and comprehend intricate information, thereby enhancing the overall performance of HAR systems. Recently, the domain of computer vision has witnessed the emergence of Vision Transformers (ViTs) as a potent solution. The efficacy of transformer architecture has been validated beyond the confines of image analysis, extending their applicability to diverse video-related tasks. Notably, within this landscape, the research community has shown keen interest in HAR, acknowledging its manifold utility and widespread adoption across various domains. This article aims to present an encompassing survey that focuses on CNNs and the evolution of Recurrent Neural Networks (RNNs) to ViTs given their importance in the domain of HAR. By conducting a thorough examination of existing literature and exploring emerging trends, this study undertakes a critical analysis and synthesis of the accumulated knowledge in this field. Additionally, it investigates the ongoing efforts to develop hybrid approaches. Following this direction, this article presents a novel hybrid model that seeks to integrate the inherent strengths of CNNs and ViTs.

Related papers

Vision Transformers in Precision Agriculture: A Comprehensive Survey [3.156133122658662]
Vision Transformers (ViTs) offer benefits such as improved handling of long-range dependencies and better scalability for visual tasks. This survey explores the application of ViTs in precision agriculture, covering tasks from classification to detection and segmentation.
arXiv Detail & Related papers (2025-04-30T14:50:02Z)
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook [85.43403500874889]
Retrieval-augmented generation (RAG) has emerged as a pivotal technique in artificial intelligence (AI) Recent advancements in RAG for embodied AI, with a particular focus on applications in planning, task execution, multimodal perception, interaction, and specialized domains.
arXiv Detail & Related papers (2025-03-23T10:33:28Z)
SMART-Vision: Survey of Modern Action Recognition Techniques in Vision [5.766136300380401]
Human Action Recognition (HAR) is a challenging domain in computer vision. HAR has garnered considerable interest due to its broad applicability. We present the novel SMART-Vision taxonomy, which illustrates how innovations in deep learning for HAR complement one another.
arXiv Detail & Related papers (2025-01-22T18:21:55Z)
From CNNs to Transformers in Multimodal Human Action Recognition: A Survey [23.674123304219822]
Human action recognition is one of the most widely studied research problems in Computer Vision. Recent studies have shown that addressing it using multimodal data leads to superior performance. Recent rise of Transformers in visual modelling is now also causing a paradigm shift for the action recognition task.
arXiv Detail & Related papers (2024-05-22T02:11:18Z)
A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond [84.95530356322621]
This survey presents a systematic review of the advancements in code intelligence. It covers over 50 representative models and their variants, more than 20 categories of tasks, and an extensive coverage of over 680 related works. Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence.
arXiv Detail & Related papers (2024-03-21T08:54:56Z)
A Survey on Transferability of Adversarial Examples across Deep Neural Networks [53.04734042366312]
adversarial examples can manipulate machine learning models into making erroneous predictions. The transferability of adversarial examples enables black-box attacks which circumvent the need for detailed knowledge of the target model. This survey explores the landscape of the adversarial transferability of adversarial examples.
arXiv Detail & Related papers (2023-10-26T17:45:26Z)
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks [60.38369406877899]
Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data. transformer models excel in handling long dependencies between input sequence elements and enable parallel processing. Our survey encompasses the identification of the top five application domains for transformer-based models.
arXiv Detail & Related papers (2023-06-11T23:13:51Z)
Human Activity Recognition Using Tools of Convolutional Neural Networks: A State of the Art Review, Data Sets, Challenges and Future Prospects [7.275302131211702]
This review is to summarize recent works based on a wide range of deep neural networks architecture, namely convolutional neural networks (CNNs) for human activity recognition. The reviewed systems are clustered into four categories depending on the use of input devices like multimodal sensing devices, smartphones, radar, and vision devices.
arXiv Detail & Related papers (2022-02-02T18:52:13Z)
Transformers in Medical Imaging: A Survey [88.03790310594533]
Transformers have been successfully applied to several computer vision problems, achieving state-of-the-art results. Medical imaging has also witnessed growing interest for Transformers that can capture global context compared to CNNs with local receptive fields. We provide a review of the applications of Transformers in medical imaging covering various aspects, ranging from recently proposed architectural designs to unsolved issues.
arXiv Detail & Related papers (2022-01-24T18:50:18Z)
Recurrent Vision Transformer for Solving Visual Reasoning Problems [13.658244210412352]
We introduce the Recurrent Vision Transformer (RViT) model for convolutional neural networks (CNNs) Thanks to the impact of recurrent connections and spatial attention in reasoning tasks, this network achieves competitive results on the same-different visual reasoning problems. A comprehensive ablation study confirms the importance of a hybrid CNN + Transformer architecture.
arXiv Detail & Related papers (2021-11-29T15:01:09Z)
Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions [37.914102870280324]
Deep neural networks (DNNs) have largely boosted their performances on many concrete tasks. Deep neural networks (DNNs) have largely boosted their performances on many concrete tasks. This paper presents the review of the recent advances with our suggestions on the new possible directions.
arXiv Detail & Related papers (2021-08-30T08:19:34Z)
Muti-view Mouse Social Behaviour Recognition with Deep Graphical Model [124.26611454540813]
Social behaviour analysis of mice is an invaluable tool to assess therapeutic efficacy of neurodegenerative diseases. Because of the potential to create rich descriptions of mouse social behaviors, the use of multi-view video recordings for rodent observations is increasingly receiving much attention. We propose a novel multiview latent-attention and dynamic discriminative model that jointly learns view-specific and view-shared sub-structures.
arXiv Detail & Related papers (2020-11-04T18:09:58Z)
Deep Learning for Community Detection: Progress, Challenges and Opportunities [79.26787486888549]
Article summarizes the contributions of the various frameworks, models, and algorithms in deep neural networks. This article summarizes the contributions of the various frameworks, models, and algorithms in deep neural networks.
arXiv Detail & Related papers (2020-05-17T11:22:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.