Related papers: Real-Time Fitness Exercise Classification and Counting from Video Frames

Real-Time Fitness Exercise Classification and Counting from Video Frames

URL: http://arxiv.org/abs/2411.11548v1
Date: Mon, 18 Nov 2024 13:06:29 GMT
Title: Real-Time Fitness Exercise Classification and Counting from Video Frames
Authors: Riccardo Riccio,
Abstract summary: This paper introduces a novel method for real-time exercise classification using a Bidirectional Long Short-Term Memory (BiLSTM) neural network. The model adapts to changes in perspective, user positioning, and body differences, improving generalization. It is integrated into a web application providing real-time exercise classification and repetition counting without manual exercise selection.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces a novel method for real-time exercise classification using a Bidirectional Long Short-Term Memory (BiLSTM) neural network. Existing exercise recognition approaches often rely on synthetic datasets, raw coordinate inputs sensitive to user and camera variations, and fail to fully exploit the temporal dependencies in exercise movements. These issues limit their generalizability and robustness in real-world conditions, where lighting, camera angles, and user body types vary. To address these challenges, we propose a BiLSTM-based model that leverages invariant features, such as joint angles, alongside raw coordinates. By using both angles and (x, y, z) coordinates, the model adapts to changes in perspective, user positioning, and body differences, improving generalization. Training on 30-frame sequences enables the BiLSTM to capture the temporal context of exercises and recognize patterns evolving over time. We compiled a dataset combining synthetic data from the InfiniteRep dataset and real-world videos from Kaggle and other sources. This dataset includes four common exercises: squat, push-up, shoulder press, and bicep curl. The model was trained and validated on these diverse datasets, achieving an accuracy of over 99% on the test set. To assess generalizability, the model was tested on 2 separate test sets representative of typical usage conditions. Comparisons with the previous approach from the literature are present in the result section showing that the proposed model is the best-performing one. The classifier is integrated into a web application providing real-time exercise classification and repetition counting without manual exercise selection. Demo and datasets are available at the following GitHub Repository: https://github.com/RiccardoRiccio/Fitness-AI-Trainer-With-Automatic-Exercise-Recognition-and-Countin g.

Related papers

OmniTraj: Pre-Training on Heterogeneous Data for Adaptive and Zero-Shot Human Trajectory Prediction [62.385417528148224]
We present OmniTraj, a Transformer-based model pre-trained on a large-scale, heterogeneous dataset.<n>Experiments show that explicitly conditioning on the frame rate enables OmniTraj to achieve state-of-the-art zero-shot transfer performance.
arXiv Detail & Related papers (2025-07-31T15:37:09Z)
Test-Time Alignment via Hypothesis Reweighting [56.71167047381817]
Large pretrained models often struggle with underspecified tasks. We propose a novel framework to address the challenge of aligning models to test-time user intent.
arXiv Detail & Related papers (2024-12-11T23:02:26Z)
SONNET: Enhancing Time Delay Estimation by Leveraging Simulated Audio [17.811771707446926]
We show that learning based methods can, even based on synthetic data, significantly outperform GCC-PHAT on novel real world data. We provide our trained model, SONNET, which is runnable in real-time and works on novel data out of the box for many real data applications.
arXiv Detail & Related papers (2024-11-20T10:23:21Z)
Continual Learning for Multimodal Data Fusion of a Soft Gripper [1.0589208420411014]
A model trained on one data modality often fails when tested with a different modality. We introduce a continual learning algorithm capable of incrementally learning different data modalities. We evaluate the algorithm's effectiveness on a challenging custom multimodal dataset.
arXiv Detail & Related papers (2024-09-20T09:53:27Z)
Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning [50.26965628047682]
Adapting pre-trained models to open classes is a challenging problem in machine learning. In this paper, we consider combining the advantages of both and come up with a test-time prompt tuning approach. Our proposed method outperforms all comparison methods on average considering both base and new classes.
arXiv Detail & Related papers (2024-08-29T12:34:01Z)
Efficient Open Set Single Image Test Time Adaptation of Vision Language Models [15.621092104244003]
Adapting models to dynamic, real-world environments is a critical challenge in deep learning.<n>We propose ROSITA, a novel framework that leverages dynamically updated feature banks to identify reliable test samples.<n>Our approach effectively adapts models to domain shifts for known classes while rejecting unfamiliar samples.
arXiv Detail & Related papers (2024-06-01T16:21:42Z)
PARSAC: Accelerating Robust Multi-Model Fitting with Parallel Sample Consensus [26.366299016589256]
We present a real-time method for robust estimation of multiple instances of geometric models from noisy data. A neural network segments the input data into clusters representing potential model instances. We demonstrate state-of-the-art performance on these as well as multiple established datasets, with inference times as small as five milliseconds per image.
arXiv Detail & Related papers (2024-01-26T14:54:56Z)
Adversarial Augmentation Training Makes Action Recognition Models More Robust to Realistic Video Distribution Shifts [13.752169303624147]
Action recognition models often lack robustness when faced with natural distribution shifts between training and test data. We propose two novel evaluation methods to assess model resilience to such distribution disparity. We experimentally demonstrate the superior performance of the proposed adversarial augmentation approach over baselines across three state-of-the-art action recognition models.
arXiv Detail & Related papers (2024-01-21T05:50:39Z)
Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations. In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z)
Transform-Equivariant Consistency Learning for Temporal Sentence Grounding [66.10949751429781]
We introduce a novel Equivariant Consistency Regulation Learning framework to learn more discriminative representations for each video. Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted. In particular, we devise a self-supervised consistency loss module to enhance the completeness and smoothness of the augmented video.
arXiv Detail & Related papers (2023-05-06T19:29:28Z)
Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects. We tackle this problem from two different angles: algorithm and dataset. We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z)
Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage. We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets. By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z)
HighlightMe: Detecting Highlights from Human-Centric Videos [52.84233165201391]
We present a domain- and user-preference-agnostic approach to detect highlightable excerpts from human-centric videos. We use an autoencoder network equipped with spatial-temporal graph convolutions to detect human activities and interactions. We observe a 4-12% improvement in the mean average precision of matching the human-annotated highlights over state-of-the-art methods.
arXiv Detail & Related papers (2021-10-05T01:18:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.