SMART-Vision: Survey of Modern Action Recognition Techniques in Vision
- URL: http://arxiv.org/abs/2501.13066v1
- Date: Wed, 22 Jan 2025 18:21:55 GMT
- Title: SMART-Vision: Survey of Modern Action Recognition Techniques in Vision
- Authors: Ali K. AlShami, Ryan Rabinowitz, Khang Lam, Yousra Shleibik, Melkamu Mersha, Terrance Boult, Jugal Kalita,
- Abstract summary: Human Action Recognition (HAR) is a challenging domain in computer vision.
HAR has garnered considerable interest due to its broad applicability.
We present the novel SMART-Vision taxonomy, which illustrates how innovations in deep learning for HAR complement one another.
- Score: 5.766136300380401
- License:
- Abstract: Human Action Recognition (HAR) is a challenging domain in computer vision, involving recognizing complex patterns by analyzing the spatiotemporal dynamics of individuals' movements in videos. These patterns arise in sequential data, such as video frames, which are often essential to accurately distinguish actions that would be ambiguous in a single image. HAR has garnered considerable interest due to its broad applicability, ranging from robotics and surveillance systems to sports motion analysis, healthcare, and the burgeoning field of autonomous vehicles. While several taxonomies have been proposed to categorize HAR approaches in surveys, they often overlook hybrid methodologies and fail to demonstrate how different models incorporate various architectures and modalities. In this comprehensive survey, we present the novel SMART-Vision taxonomy, which illustrates how innovations in deep learning for HAR complement one another, leading to hybrid approaches beyond traditional categories. Our survey provides a clear roadmap from foundational HAR works to current state-of-the-art systems, highlighting emerging research directions and addressing unresolved challenges in discussion sections for architectures within the HAR domain. We provide details of the research datasets that various approaches used to measure and compare goodness HAR approaches. We also explore the rapidly emerging field of Open-HAR systems, which challenges HAR systems by presenting samples from unknown, novel classes during test time.
Related papers
- Out-of-Distribution Detection on Graphs: A Survey [58.47395497985277]
Graph out-of-distribution (GOOD) detection focuses on identifying graph data that deviates from the distribution seen during training.
We categorize existing methods into four types: enhancement-based, reconstruction-based, information propagation-based, and classification-based approaches.
We discuss practical applications and theoretical foundations, highlighting the unique challenges posed by graph data.
arXiv Detail & Related papers (2025-02-12T04:07:12Z) - Generative Artificial Intelligence Meets Synthetic Aperture Radar: A Survey [49.29751866761522]
This paper aims to investigate the intersection of GenAI and SAR.
First, we illustrate the common data generation-based applications in SAR field.
Then, an overview of the latest GenAI models is systematically reviewed.
Finally, the corresponding applications in SAR domain are also included.
arXiv Detail & Related papers (2024-11-05T03:06:00Z) - A Comprehensive Methodological Survey of Human Activity Recognition Across Divers Data Modalities [2.916558661202724]
Human Activity Recognition (HAR) systems aim to understand human behaviour and assign a label to each action.
HAR can leverage various data modalities, such as RGB images and video, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, and radar signals.
This paper presents a comprehensive survey of the latest advancements in HAR from 2014 to 2024.
arXiv Detail & Related papers (2024-09-15T10:04:44Z) - Explainable Deep Learning Framework for Human Activity Recognition [3.9146761527401424]
We propose a model-agnostic framework that enhances interpretability and efficacy of HAR models.
By implementing competitive data augmentation, our framework provides intuitive and accessible explanations of model decisions.
arXiv Detail & Related papers (2024-08-21T11:59:55Z) - A Comprehensive Review of Few-shot Action Recognition [64.47305887411275]
Few-shot action recognition aims to address the high cost and impracticality of manually labeling complex and variable video data.
It requires accurately classifying human actions in videos using only a few labeled examples per class.
arXiv Detail & Related papers (2024-07-20T03:53:32Z) - RNNs, CNNs and Transformers in Human Action Recognition: A Survey and a Hybrid Model [0.0]
Human Action Recognition (HAR) encompasses the task of monitoring human activities across various domains.
Over the past decade, the field of HAR has witnessed substantial progress by leveraging Convolutional Neural Networks (CNNs)
Recently, the domain of computer vision has witnessed the emergence of Vision Transformers (ViTs) as a potent solution.
arXiv Detail & Related papers (2024-06-02T17:09:59Z) - Vision+X: A Survey on Multimodal Learning in the Light of Data [64.03266872103835]
multimodal machine learning that incorporates data from various sources has become an increasingly popular research area.
We analyze the commonness and uniqueness of each data format mainly ranging from vision, audio, text, and motions.
We investigate the existing literature on multimodal learning from both the representation learning and downstream application levels.
arXiv Detail & Related papers (2022-10-05T13:14:57Z) - A Survey of Graph-based Deep Learning for Anomaly Detection in
Distributed Systems [2.3551989288556774]
We explore the potentials of graph-based algorithms to identify anomalies in distributed systems.
One of our objectives is to provide an in-depth look at graph-based approaches to conceptually analyze their capability to handle real-world challenges.
This study gives an overview of the State-of-the-Art (SotA) research articles in the field and compare and contrast their characteristics.
arXiv Detail & Related papers (2022-06-08T20:19:28Z) - Scene Graph Generation: A Comprehensive Survey [35.80909746226258]
Scene graph has been the focus of research because of its powerful semantic representation and applications to scene understanding.
Scene Graph Generation (SGG) refers to the task of automatically mapping an image into a semantic structural scene graph.
We review 138 representative works that cover different input modalities, and systematically summarize existing methods of image-based SGG.
arXiv Detail & Related papers (2022-01-03T00:55:33Z) - A Survey on Heterogeneous Graph Embedding: Methods, Techniques,
Applications and Sources [79.48829365560788]
Heterogeneous graphs (HGs) also known as heterogeneous information networks have become ubiquitous in real-world scenarios.
HG embedding aims to learn representations in a lower-dimension space while preserving the heterogeneous structures and semantics for downstream tasks.
arXiv Detail & Related papers (2020-11-30T15:03:47Z) - Recent Progress in Appearance-based Action Recognition [73.6405863243707]
Action recognition is a task to identify various human actions in a video.
Recent appearance-based methods have achieved promising progress towards accurate action recognition.
arXiv Detail & Related papers (2020-11-25T10:18:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.