Related papers: Enhancing Fitness Movement Recognition with Attention Mechanism and Pre-Trained Feature Extractors

Enhancing Fitness Movement Recognition with Attention Mechanism and Pre-Trained Feature Extractors

URL: http://arxiv.org/abs/2509.02511v1
Date: Tue, 02 Sep 2025 17:04:42 GMT
Title: Enhancing Fitness Movement Recognition with Attention Mechanism and Pre-Trained Feature Extractors
Authors: Shanjid Hasan Nishat, Srabonti Deb, Mohiuddin Ahmed,
Abstract summary: Fitness movement recognition plays a vital role in health monitoring, rehabilitation, and personalized fitness training.<n>We present a framework that integrates pre-trained 2D Convolutional Neural Networks (CNNs) with a Long Short-Term Memory (LSTM) network enhanced by spatial attention.<n>We evaluate the framework on a curated subset of the UCF101 dataset, achieving a peak accuracy of 93.34% with the ResNet50-based configuration.
Score: 1.7619303397097408
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Fitness movement recognition, a focused subdomain of human activity recognition (HAR), plays a vital role in health monitoring, rehabilitation, and personalized fitness training by enabling automated exercise classification from video data. However, many existing deep learning approaches rely on computationally intensive 3D models, limiting their feasibility in real-time or resource-constrained settings. In this paper, we present a lightweight and effective framework that integrates pre-trained 2D Convolutional Neural Networks (CNNs) such as ResNet50, EfficientNet, and Vision Transformers (ViT) with a Long Short-Term Memory (LSTM) network enhanced by spatial attention. These models efficiently extract spatial features while the LSTM captures temporal dependencies, and the attention mechanism emphasizes informative segments. We evaluate the framework on a curated subset of the UCF101 dataset, achieving a peak accuracy of 93.34\% with the ResNet50-based configuration. Comparative results demonstrate the superiority of our approach over several state-of-the-art HAR systems. The proposed method offers a scalable and real-time-capable solution for fitness activity recognition with broader applications in vision-based health and activity monitoring.

Related papers

DeepACTIF: Efficient Feature Attribution via Activation Traces in Neural Sequence Models [0.0]
Feature attribution is essential for interpreting deep learning models in time-series domains such as healthcare, biometrics, and human-AI interaction.<n>Standard attribution methods, such as Integrated Gradients or SHAP, are computationally intensive and not well-suited for real-time applications.<n>We present DeepACTIF, a lightweight and architecture-aware feature attribution method that leverages internal activations of sequence models to estimate feature importance efficiently.
arXiv Detail & Related papers (2025-09-18T15:47:05Z)
Spatiotemporal Attention Learning Framework for Event-Driven Object Recognition [1.0445957451908694]
Event-based vision sensors capture local pixel-level intensity changes as a sparse event stream containing position, polarity, and information.<n>This paper presents a novel learning framework for event-based object recognition, utilizing a VARGG network enhanced with Contemporalal Block Attention Module (CBAM)<n>Our approach achieves comparable performance to state-of-the-art ResNet-based methods while reducing parameter count by 2.3% compared to the original VGG model.
arXiv Detail & Related papers (2025-04-01T02:37:54Z)
Spiking Meets Attention: Efficient Remote Sensing Image Super-Resolution with Attention Spiking Neural Networks [57.17129753411926]
Spiking neural networks (SNNs) are emerging as a promising alternative to traditional artificial neural networks (ANNs)<n>We propose SpikeSR, which achieves state-of-the-art performance across various remote sensing benchmarks such as AID, DOTA, and DIOR.
arXiv Detail & Related papers (2025-03-06T09:06:06Z)
Self-STORM: Deep Unrolled Self-Supervised Learning for Super-Resolution Microscopy [55.2480439325792]
We introduce deep unrolled self-supervised learning, which alleviates the need for such data by training a sequence-specific, model-based autoencoder. Our proposed method exceeds the performance of its supervised counterparts.
arXiv Detail & Related papers (2024-03-25T17:40:32Z)
ELA: Efficient Local Attention for Deep Convolutional Neural Networks [15.976475674061287]
This paper introduces an Efficient Local Attention (ELA) method that achieves substantial performance improvements with a simple structure. To overcome these challenges, we propose the incorporation of 1D convolution and Group Normalization feature enhancement techniques. ELA can be seamlessly integrated into deep CNN networks such as ResNet, MobileNet, and DeepLab.
arXiv Detail & Related papers (2024-03-02T08:06:18Z)
Deep Reinforcement Learning Empowered Activity-Aware Dynamic Health Monitoring Systems [69.41229290253605]
Existing monitoring approaches were designed on the premise that medical devices track several health metrics concurrently. This means that they report all relevant health values within that scope, which can result in excess resource use and the gathering of extraneous data. We propose Dynamic Activity-Aware Health Monitoring strategy (DActAHM) for striking a balance between optimal monitoring performance and cost efficiency.
arXiv Detail & Related papers (2024-01-19T16:26:35Z)
Understanding Self-attention Mechanism via Dynamical System Perspective [58.024376086269015]
Self-attention mechanism (SAM) is widely used in various fields of artificial intelligence. We show that intrinsic stiffness phenomenon (SP) in the high-precision solution of ordinary differential equations (ODEs) also widely exists in high-performance neural networks (NN) We show that the SAM is also a stiffness-aware step size adaptor that can enhance the model's representational ability to measure intrinsic SP.
arXiv Detail & Related papers (2023-08-19T08:17:41Z)
Baby Physical Safety Monitoring in Smart Home Using Action Recognition System [0.0]
We present a novel framework combining transfer learning techniques with a Conv2D LSTM layer to extract features from the pre-trained I3D model on the Kinetics dataset. We developed a benchmark dataset and an automated model that uses LSTM convolution with I3D (ConvLSTM-I3D) for recognizing and predicting baby activities in a smart baby room.
arXiv Detail & Related papers (2022-10-22T19:00:14Z)
3D Convolutional with Attention for Action Recognition [6.238518976312625]
Current action recognition methods use computationally expensive models for learning-temporal dependencies of the action. This paper proposes a deep neural network architecture for learning such dependencies consisting of a 3D convolutional layer, fully connected layers and attention layer. The method first learns spatial features and temporal of actions through 3D-CNN, and then the attention temporal mechanism helps the model to locate attention to essential features.
arXiv Detail & Related papers (2022-06-05T15:12:57Z)
A Spatio-Temporal Multilayer Perceptron for Gesture Recognition [70.34489104710366]
We propose a multilayer state-weighted perceptron for gesture recognition in the context of autonomous vehicles. An evaluation of TCG and Drive&Act datasets is provided to showcase the promising performance of our approach. We deploy our model to our autonomous vehicle to show its real-time capability and stable execution.
arXiv Detail & Related papers (2022-04-25T08:42:47Z)
Object Tracking through Residual and Dense LSTMs [67.98948222599849]
Deep learning-based trackers based on LSTMs (Long Short-Term Memory) recurrent neural networks have emerged as a powerful alternative. DenseLSTMs outperform Residual and regular LSTM, and offer a higher resilience to nuisances. Our case study supports the adoption of residual-based RNNs for enhancing the robustness of other trackers.
arXiv Detail & Related papers (2020-06-22T08:20:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.