Fast and Robust Video-Based Exercise Classification via Body Pose
Tracking and Scalable Multivariate Time Series Classifiers
- URL: http://arxiv.org/abs/2210.00507v1
- Date: Sun, 2 Oct 2022 13:03:38 GMT
- Title: Fast and Robust Video-Based Exercise Classification via Body Pose
Tracking and Scalable Multivariate Time Series Classifiers
- Authors: Ashish Singh, Antonio Bevilacqua, Thach Le Nguyen, Feiyan Hu, Kevin
McGuinness, Martin OReilly, Darragh Whelan, Brian Caulfield, Georgiana Ifrim
- Abstract summary: We present the application of classifying S&C exercises using video.
We propose an approach named BodyMTS to turn video into time series by employing body pose tracking.
We show that BodyMTS achieves an average accuracy of 87%, which is significantly higher than the accuracy of human domain experts.
- Score: 13.561233730881279
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Technological advancements have spurred the usage of machine learning based
applications in sports science. Physiotherapists, sports coaches and athletes
actively look to incorporate the latest technologies in order to further
improve performance and avoid injuries. While wearable sensors are very
popular, their use is hindered by constraints on battery power and sensor
calibration, especially for use cases which require multiple sensors to be
placed on the body. Hence, there is renewed interest in video-based data
capture and analysis for sports science. In this paper, we present the
application of classifying S\&C exercises using video. We focus on the popular
Military Press exercise, where the execution is captured with a video-camera
using a mobile device, such as a mobile phone, and the goal is to classify the
execution into different types. Since video recordings need a lot of storage
and computation, this use case requires data reduction, while preserving the
classification accuracy and enabling fast prediction. To this end, we propose
an approach named BodyMTS to turn video into time series by employing body pose
tracking, followed by training and prediction using multivariate time series
classifiers. We analyze the accuracy and robustness of BodyMTS and show that it
is robust to different types of noise caused by either video quality or pose
estimation factors. We compare BodyMTS to state-of-the-art deep learning
methods which classify human activity directly from videos and show that
BodyMTS achieves similar accuracy, but with reduced running time and model
engineering effort. Finally, we discuss some of the practical aspects of
employing BodyMTS in this application in terms of accuracy and robustness under
reduced data quality and size. We show that BodyMTS achieves an average
accuracy of 87\%, which is significantly higher than the accuracy of human
domain experts.
Related papers
- VideoRun2D: Cost-Effective Markerless Motion Capture for Sprint Biomechanics [12.12643642515884]
Sprinting is a determinant ability, especially in team sports. The kinematics of the sprint have been studied in the past using different methods.
This study first adapts two general trackers for realistic biomechanical analysis and then evaluate them in comparison to manual tracking.
Our best resulting markerless body tracker particularly adapted for sprint biomechanics is termed VideoRun2D.
arXiv Detail & Related papers (2024-09-16T11:10:48Z) - Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - HMP: Hand Motion Priors for Pose and Shape Estimation from Video [52.39020275278984]
We develop a generative motion prior specific for hands, trained on the AMASS dataset which features diverse and high-quality hand motions.
Our integration of a robust motion prior significantly enhances performance, especially in occluded scenarios.
We demonstrate our method's efficacy via qualitative and quantitative evaluations on the HO3D and DexYCB datasets.
arXiv Detail & Related papers (2023-12-27T22:35:33Z) - SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame
Interpolation [11.198172694893927]
SportsSloMo is a benchmark consisting of more than 130K video clips and 1M video frames of high-resolution ($geq$720p) slow-motion sports videos crawled from YouTube.
We re-train several state-of-the-art methods on our benchmark, and the results show a decrease in their accuracy compared to other datasets.
We introduce two loss terms considering the human-aware priors, where we add auxiliary supervision to panoptic segmentation and human keypoints detection.
arXiv Detail & Related papers (2023-08-31T17:23:50Z) - An Examination of Wearable Sensors and Video Data Capture for Human
Exercise Classification [9.674125829493214]
We compare the performance of IMUs to a video-based approach for human exercise classification on two real-world datasets.
We observe that an approach based on a single camera can outperform a single IMU by 10 percentage points on average.
Our work opens up new and more realistic avenues for this application, where a video captured using a readily available smartphone camera, combined with a single sensor, can be used for effective human exercise classification.
arXiv Detail & Related papers (2023-07-10T12:24:04Z) - Perception Test: A Diagnostic Benchmark for Multimodal Video Models [78.64546291816117]
We propose a novel multimodal video benchmark to evaluate the perception and reasoning skills of pre-trained multimodal models.
The Perception Test focuses on skills (Memory, Abstraction, Physics, Semantics) and types of reasoning (descriptive, explanatory, predictive, counterfactual) across video, audio, and text modalities.
The benchmark probes pre-trained models for their transfer capabilities, in a zero-shot / few-shot or limited finetuning regime.
arXiv Detail & Related papers (2023-05-23T07:54:37Z) - Towards Single Camera Human 3D-Kinematics [15.559206592078425]
We propose a novel approach for direct 3D human kinematic estimation D3KE from videos using deep neural networks.
Our experiments demonstrate that the proposed end-to-end training is robust and outperforms 2D and 3D markerless motion capture based kinematic estimation pipelines.
arXiv Detail & Related papers (2023-01-13T08:44:09Z) - Federated Remote Physiological Measurement with Imperfect Data [10.989271258156883]
Growing need for technology that supports remote healthcare is being highlighted by an aging population and the COVID-19 pandemic.
In health-related machine learning applications the ability to learn predictive models without data leaving a private device is attractive.
Camera-based remote physiological sensing facilitates scalable and low-cost measurement.
arXiv Detail & Related papers (2022-03-11T05:26:46Z) - One to Many: Adaptive Instrument Segmentation via Meta Learning and
Dynamic Online Adaptation in Robotic Surgical Video [71.43912903508765]
MDAL is a dynamic online adaptive learning scheme for instrument segmentation in robot-assisted surgery.
It learns the general knowledge of instruments and the fast adaptation ability through the video-specific meta-learning paradigm.
It outperforms other state-of-the-art methods on two datasets.
arXiv Detail & Related papers (2021-03-24T05:02:18Z) - Hybrid Dynamic-static Context-aware Attention Network for Action
Assessment in Long Videos [96.45804577283563]
We present a novel hybrid dynAmic-static Context-aware attenTION NETwork (ACTION-NET) for action assessment in long videos.
We learn the video dynamic information but also focus on the static postures of the detected athletes in specific frames.
We combine the features of the two streams to regress the final video score, supervised by ground-truth scores given by experts.
arXiv Detail & Related papers (2020-08-13T15:51:42Z) - Contact and Human Dynamics from Monocular Video [73.47466545178396]
Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors.
We present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input.
arXiv Detail & Related papers (2020-07-22T21:09:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.