BiomechGPT: Towards a Biomechanically Fluent Multimodal Foundation Model for Clinically Relevant Motion Tasks
- URL: http://arxiv.org/abs/2505.18465v1
- Date: Sat, 24 May 2025 02:15:23 GMT
- Title: BiomechGPT: Towards a Biomechanically Fluent Multimodal Foundation Model for Clinically Relevant Motion Tasks
- Authors: Ruize Yang, Ann Kennedy, R. James Cotton,
- Abstract summary: We created a multimodal dataset of motion-related questions and answers spanning a range of tasks.<n>We developed BiomechGPT, a multimodal biomechanics-language model, on this dataset.<n>Our results show that BiomechGPT demonstrates high performance across a range of tasks such as activity recognition, identifying movement impairments, diagnosis, scoring clinical outcomes, and measuring walking.
- Score: 3.097167420381722
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Advances in markerless motion capture are expanding access to biomechanical movement analysis, making it feasible to obtain high-quality movement data from outpatient clinics, inpatient hospitals, therapy, and even home. Expanding access to movement data in these diverse contexts makes the challenge of performing downstream analytics all the more acute. Creating separate bespoke analysis code for all the tasks end users might want is both intractable and does not take advantage of the common features of human movement underlying them all. Recent studies have shown that fine-tuning language models to accept tokenized movement as an additional modality enables successful descriptive captioning of movement. Here, we explore whether such a multimodal motion-language model can answer detailed, clinically meaningful questions about movement. We collected over 30 hours of biomechanics from nearly 500 participants, many with movement impairments from a variety of etiologies, performing a range of movements used in clinical outcomes assessments. After tokenizing these movement trajectories, we created a multimodal dataset of motion-related questions and answers spanning a range of tasks. We developed BiomechGPT, a multimodal biomechanics-language model, on this dataset. Our results show that BiomechGPT demonstrates high performance across a range of tasks such as activity recognition, identifying movement impairments, diagnosis, scoring clinical outcomes, and measuring walking. BiomechGPT provides an important step towards a foundation model for rehabilitation movement data.
Related papers
- KinTwin: Imitation Learning with Torque and Muscle Driven Biomechanical Models Enables Precise Replication of Able-Bodied and Impaired Movement from Markerless Motion Capture [2.44755919161855]
High-quality movement analysis could greatly benefit movement science and rehabilitation.<n>We show the potential for using imitation learning to enable high-quality movement analysis in clinical practice.
arXiv Detail & Related papers (2025-05-19T17:58:03Z) - Learning Speed-Adaptive Walking Agent Using Imitation Learning with Physics-Informed Simulation [0.0]
We create a skeletal humanoid agent capable of adapting to varying walking speeds while maintaining biomechanically realistic motions.<n>The framework combines a synthetic data generator, which produces biomechanically plausible gait kinematics from open-source biomechanics data, and a training system that uses adversarial imitation learning to train the agent's walking policy.
arXiv Detail & Related papers (2024-12-05T07:55:58Z) - Scaling Wearable Foundation Models [54.93979158708164]
We investigate the scaling properties of sensor foundation models across compute, data, and model size.
Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, electrodermal activity, accelerometer, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM.
Our results establish the scaling laws of LSM for tasks such as imputation, extrapolation, both across time and sensor modalities.
arXiv Detail & Related papers (2024-10-17T15:08:21Z) - ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports.
Based on this dataset, we focus on the challanging task of unsupervised pretraining.
We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z) - A Graph-based Approach to Human Activity Recognition [5.323279718522213]
This paper presents a methodology to efficiently extract substantial insights from expanding real-time datasets.
By utilizing data from Inertial Measurement Units (IMU) and Global Navigation Satellite Systems (GNSS) receivers, athletic performance can be analyzed using directed graphs.
Our approach is demonstrated on biathlon data and detects specific points of interest and complex movement sequences.
arXiv Detail & Related papers (2024-08-19T17:51:00Z) - ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab [67.24684071577211]
The challenge of replicating research results has posed a significant impediment to the field of molecular biology.
We first curate a comprehensive multimodal dataset, named ProBio, as an initial step towards this objective.
Next, we devise two challenging benchmarks, transparent solution tracking and multimodal action recognition, to emphasize the unique characteristics and difficulties associated with activity understanding in BioLab settings.
arXiv Detail & Related papers (2023-11-01T14:44:01Z) - DiverseMotion: Towards Diverse Human Motion Generation via Discrete
Diffusion [70.33381660741861]
We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions.
We show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity.
arXiv Detail & Related papers (2023-09-04T05:43:48Z) - Multimodal video and IMU kinematic dataset on daily life activities
using affordable devices (VIDIMU) [0.0]
The objective of the dataset is to pave the way towards affordable patient gross motor tracking solutions for daily life activities recognition and kinematic analysis.
The novelty of dataset lies in: (i) the clinical relevance of the chosen movements, (ii) the combined utilization of affordable video and custom sensors, and (iii) the implementation of state-of-the-art tools for multimodal data processing of 3D body pose tracking and motion reconstruction.
arXiv Detail & Related papers (2023-03-27T14:05:49Z) - Markerless Motion Capture and Biomechanical Analysis Pipeline [0.0]
Markerless motion capture has the potential to expand access to precise movement analysis.
Our pipeline makes it easy to obtain accurate biomechanical estimates of movement in a rehabilitation hospital.
arXiv Detail & Related papers (2023-03-19T13:31:57Z) - MoDi: Unconditional Motion Synthesis from Diverse Data [51.676055380546494]
We present MoDi, an unconditional generative model that synthesizes diverse motions.
Our model is trained in a completely unsupervised setting from a diverse, unstructured and unlabeled motion dataset.
We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered.
arXiv Detail & Related papers (2022-06-16T09:06:25Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.