Automated ARAT Scoring Using Multimodal Video Analysis, Multi-View Fusion, and Hierarchical Bayesian Models: A Clinician Study
- URL: http://arxiv.org/abs/2505.01680v1
- Date: Sat, 03 May 2025 04:00:51 GMT
- Title: Automated ARAT Scoring Using Multimodal Video Analysis, Multi-View Fusion, and Hierarchical Bayesian Models: A Clinician Study
- Authors: Tamim Ahmed, Thanassis Rikakis,
- Abstract summary: Manual scoring of the Action Research Arm Test (ARAT) for upper extremity assessment in stroke rehabilitation is time-intensive and variable.<n>We propose an automated ARAT scoring system integrating multimodal video analysis with SlowFast, I3D, and Transformer-based models using OpenPose keypoints and object locations.<n>This work advances automated rehabilitation by offering a scalable, interpretable solution with clinical validation.
- Score: 1.0463644684200606
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Manual scoring of the Action Research Arm Test (ARAT) for upper extremity assessment in stroke rehabilitation is time-intensive and variable. We propose an automated ARAT scoring system integrating multimodal video analysis with SlowFast, I3D, and Transformer-based models using OpenPose keypoints and object locations. Our approach employs multi-view data (ipsilateral, contralateral, and top perspectives), applying early and late fusion to combine features across views and models. Hierarchical Bayesian Models (HBMs) infer movement quality components, enhancing interpretability. A clinician dashboard displays task scores, execution times, and quality assessments. We conducted a study with five clinicians who reviewed 500 video ratings generated by our system, providing feedback on its accuracy and usability. Evaluated on a stroke rehabilitation dataset, our framework achieves 89.0% validation accuracy with late fusion, with HBMs aligning closely with manual assessments. This work advances automated rehabilitation by offering a scalable, interpretable solution with clinical validation.
Related papers
- On the Role of Feedback in Test-Time Scaling of Agentic AI Workflows [71.92083784393418]
Agentic AI (systems that autonomously plan and act) are becoming widespread, yet their task success rate on complex tasks remains low.<n>Inference-time alignment relies on three components: sampling, evaluation, and feedback.<n>We introduce Iterative Agent Decoding (IAD), a procedure that repeatedly inserts feedback extracted from different forms of critiques.
arXiv Detail & Related papers (2025-04-02T17:40:47Z) - VideoGen-Eval: Agent-based System for Video Generation Evaluation [54.662739174367836]
Video generation has rendered existing evaluation systems inadequate for assessing state-of-the-art models.<n>We propose VideoGen-Eval, an agent evaluation system that integrates content structuring, MLLM-based content judgment, and patch tools for temporal-dense dimensions.<n>We introduce a video generation benchmark to evaluate existing cutting-edge models and verify the effectiveness of our evaluation system.
arXiv Detail & Related papers (2025-03-30T14:12:21Z) - Efficient Frame Extraction: A Novel Approach Through Frame Similarity and Surgical Tool Tracking for Video Segmentation [0.0]
We propose a technique that can efficiently eliminate redundant frames to reduce dataset size and computation time.<n>Specifically, we compute the similarity between consecutive frames by tracking the movement of surgical tools.<n>We evaluate the effectiveness of our approach by analyzing datasets obtained through retrospective reviews of cases.
arXiv Detail & Related papers (2025-01-19T19:36:09Z) - Towards Robust Algorithms for Surgical Phase Recognition via Digital Twin Representation [13.388576093178887]
We present a DT representation-based framework for surgical phase recognition from videos.<n>The framework is trained on the Cholec80 dataset and evaluated on out-of-distribution and corrupted test samples.<n>Our findings lend support to the thesis that DT representations are effective in enhancing model robustness.
arXiv Detail & Related papers (2024-10-26T00:49:06Z) - AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios.
We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z) - MR-STGN: Multi-Residual Spatio Temporal Graph Network Using Attention Fusion for Patient Action Assessment [0.30693357740321775]
We propose an automated approach for patient action assessment using a Multi-Residual Spatio Temporal Graph Network (MR-STGN)<n>The MR-STGN is specifically designed to capture the dynamics of patient actions.<n>We evaluate our model on the UI-PRMD dataset demonstrating its performance in accurately predicting real-time patient action scores.
arXiv Detail & Related papers (2023-12-21T01:09:52Z) - D-STGCNT: A Dense Spatio-Temporal Graph Conv-GRU Network based on transformer for assessment of patient physical rehabilitation [0.30693357740321775]
This paper introduces a new graph-based model for assessing rehabilitation exercises.<n>Dense connections and GRU mechanisms are used to rapidly process large 3D skeleton inputs.<n>The evaluation of our proposed approach on the KIMORE and UI-PRMD datasets highlighted its potential.
arXiv Detail & Related papers (2023-12-21T00:38:31Z) - Coordinate Transformer: Achieving Single-stage Multi-person Mesh
Recovery from Videos [91.44553585470688]
Multi-person 3D mesh recovery from videos is a critical first step towards automatic perception of group behavior in virtual reality, physical therapy and beyond.
We propose the Coordinate transFormer (CoordFormer) that directly models multi-person spatial-temporal relations and simultaneously performs multi-mesh recovery in an end-to-end manner.
Experiments on the 3DPW dataset demonstrate that CoordFormer significantly improves the state-of-the-art, outperforming the previously best results by 4.2%, 8.8% and 4.7% according to the MPJPE, PAMPJPE, and PVE metrics, respectively.
arXiv Detail & Related papers (2023-08-20T18:23:07Z) - Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review
and Replicability Study [60.56194508762205]
We reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models.
We show that several models underperform due to weak configurations, poorly sampled train-test splits, and insufficient evaluation.
We present the first comprehensive results on the newly released MIMIC-IV dataset using the reproduced models.
arXiv Detail & Related papers (2023-04-21T11:54:44Z) - Tele-EvalNet: A Low-cost, Teleconsultation System for Home based
Rehabilitation of Stroke Survivors using Multiscale CNN-LSTM Architecture [7.971065005161566]
We propose Tele-EvalNet, a novel system consisting of two components: a live feedback model and an overall performance evaluation model.
The live feedback model demonstrates feedback on exercise correctness with easy to understand instructions highlighted using color markers.
The overall performance evaluation model learns a mapping of joint data to scores, given to the performance by clinicians.
arXiv Detail & Related papers (2021-12-06T16:58:00Z) - Assessing YOLACT++ for real time and robust instance segmentation of
medical instruments in endoscopic procedures [0.5735035463793008]
Image-based tracking of laparoscopic instruments plays a fundamental role in computer and robotic-assisted surgeries.
To date, most of the existing models for instance segmentation of medical instruments were based on two-stage detectors.
We propose the addition of attention mechanisms to the YOLACT architecture that allows real-time instance segmentation of instruments.
arXiv Detail & Related papers (2021-03-30T00:09:55Z) - One-shot action recognition towards novel assistive therapies [63.23654147345168]
This work is motivated by the automated analysis of medical therapies that involve action imitation games.
The presented approach incorporates a pre-processing step that standardizes heterogeneous motion data conditions.
We evaluate the approach on a real use-case of automated video analysis for therapy support with autistic people.
arXiv Detail & Related papers (2021-02-17T19:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.