MMeViT: Multi-Modal ensemble ViT for Post-Stroke Rehabilitation Action Recognition
- URL: http://arxiv.org/abs/2509.23044v1
- Date: Sat, 27 Sep 2025 01:46:26 GMT
- Title: MMeViT: Multi-Modal ensemble ViT for Post-Stroke Rehabilitation Action Recognition
- Authors: Ye-eun Kim, Suhyeon Lim, Andrew J. Choi,
- Abstract summary: Key component of remote monitoring systems is Human Action Recognition (HAR) technology, which classifies actions.<n>Har research for stroke has largely concentrated on classifying relatively simple actions using machine learning rather than deep learning.<n>In this study, we designed a system to monitor the actions of stroke patients, focusing on domiciliary upper limb Activities of Daily Living (ADL)<n>We analyzed the collected dataset and found that the action data of stroke patients is less clustering than that of non-disabled individuals.
- Score: 1.0781866671930853
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Rehabilitation therapy for stroke patients faces a supply shortage despite the increasing demand. To address this issue, remote monitoring systems that reduce the burden on medical staff are emerging as a viable alternative. A key component of these remote monitoring systems is Human Action Recognition (HAR) technology, which classifies actions. However, existing HAR studies have primarily focused on non-disable individuals, making them unsuitable for recognizing the actions of stroke patients. HAR research for stroke has largely concentrated on classifying relatively simple actions using machine learning rather than deep learning. In this study, we designed a system to monitor the actions of stroke patients, focusing on domiciliary upper limb Activities of Daily Living (ADL). Our system utilizes IMU (Inertial Measurement Unit) sensors and an RGB-D camera, which are the most common modalities in HAR. We directly collected a dataset through this system, investigated an appropriate preprocess and proposed a deep learning model suitable for processing multimodal data. We analyzed the collected dataset and found that the action data of stroke patients is less clustering than that of non-disabled individuals. Simultaneously, we found that the proposed model learns similar tendencies for each label in data with features that are difficult to clustering. This study suggests the possibility of expanding the deep learning model, which has learned the action features of stroke patients, to not only simple action recognition but also feedback such as assessment contributing to domiciliary rehabilitation in future research. The code presented in this study is available at https://github.com/ye-Kim/MMeViT.
Related papers
- Improving Out-of-distribution Human Activity Recognition via IMU-Video Cross-modal Representation Learning [3.177649348456073]
Human Activity Recognition (HAR) based on wearable inertial sensors plays a critical role in remote health monitoring.<n>We propose a new cross-modal self-supervised pretraining approach to learn representations from large-sale unlabeled IMU-video data.<n>Our results indicate that the proposed cross-modal pretraining approach outperforms the current state-of-the-art IMU-video pretraining approach.
arXiv Detail & Related papers (2025-07-17T18:47:46Z) - A comparative study on wearables and single-camera video for upper-limb
out-of-thelab activity recognition with different deep learning architectures [0.0]
High-end Inertial Measurement Units (IMU) have become increasingly popular for assessing human physical activity in clinical and research settings.
To increase the feasibility of patient tracking in out-of-the-lab settings, it is necessary to use a reduced number of devices for movement acquisition.
The development of machine learning systems able to recognize and digest clinically relevant data in-the-wild is needed.
arXiv Detail & Related papers (2024-02-04T19:45:59Z) - Multimodal Contrastive Learning with Hard Negative Sampling for Human
Activity Recognition [14.88934924520362]
Human Activity Recognition (HAR) systems have been extensively studied by the vision and ubiquitous computing communities.
We propose a hard negative sampling method for multimodal HAR with a hard negative sampling loss for skeleton and IMU data pairs.
We demonstrate the robustness of our approach forlearning strong feature representation for HAR tasks, and on the limited data setting.
arXiv Detail & Related papers (2023-09-03T20:00:37Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community.
The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored.
We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z) - LifeLonger: A Benchmark for Continual Disease Classification [59.13735398630546]
We introduce LifeLonger, a benchmark for continual disease classification on the MedMNIST collection.
Task and class incremental learning of diseases address the issue of classifying new samples without re-training the models from scratch.
Cross-domain incremental learning addresses the issue of dealing with datasets originating from different institutions while retaining the previously obtained knowledge.
arXiv Detail & Related papers (2022-04-12T12:25:05Z) - Reducing Catastrophic Forgetting in Self Organizing Maps with
Internally-Induced Generative Replay [67.50637511633212]
A lifelong learning agent is able to continually learn from potentially infinite streams of pattern sensory data.
One major historic difficulty in building agents that adapt is that neural systems struggle to retain previously-acquired knowledge when learning from new samples.
This problem is known as catastrophic forgetting (interference) and remains an unsolved problem in the domain of machine learning to this day.
arXiv Detail & Related papers (2021-12-09T07:11:14Z) - Ambiguous Dynamic Treatment Regimes: A Reinforcement Learning Approach [0.0]
Dynamic Treatment Regimes (DTRs) are widely studied to formalize this process.
We develop Reinforcement Learning methods to efficiently learn optimal treatment regimes.
arXiv Detail & Related papers (2021-12-08T20:22:04Z) - Active Selection of Classification Features [0.0]
Auxiliary data, such as demographics, might help in selecting a smaller sample that comprises the individuals with the most informative MRI scans.
We propose two utility-based approaches for this problem, and evaluate their performance on three public real-world benchmark datasets.
arXiv Detail & Related papers (2021-02-26T18:19:08Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Detecting Parkinsonian Tremor from IMU Data Collected In-The-Wild using
Deep Multiple-Instance Learning [59.74684475991192]
Parkinson's Disease (PD) is a slowly evolving neuro-logical disease that affects about 1% of the population above 60 years old.
PD symptoms include tremor, rigidity and braykinesia.
We present a method for automatically identifying tremorous episodes related to PD, based on IMU signals captured via a smartphone device.
arXiv Detail & Related papers (2020-05-06T09:02:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.