A multimodal vision foundation model for generalizable knee pathology
- URL: http://arxiv.org/abs/2601.18250v1
- Date: Mon, 26 Jan 2026 08:14:51 GMT
- Title: A multimodal vision foundation model for generalizable knee pathology
- Authors: Kang Yu, Dingyu Wang, Zimu Yuan, Nan Zhou, Jiajun Liu, Jiaxin Liu, Shanggui Liu, Yaoyan Zheng, Huishu Yuan, Di Huang, Dong Jiang,
- Abstract summary: Musculoskeletal disorders represent an urgent demand for precise interpretation of medical imaging.<n>Current artificial intelligence approaches in orthopedics rely on task-specific, supervised learning paradigms.<n>We introduce OrthoFoundation, a multimodal vision foundation model optimized for musculoskeletal pathology.
- Score: 40.03838145472935
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Musculoskeletal disorders represent a leading cause of global disability, creating an urgent demand for precise interpretation of medical imaging. Current artificial intelligence (AI) approaches in orthopedics predominantly rely on task-specific, supervised learning paradigms. These methods are inherently fragmented, require extensive annotated datasets, and often lack generalizability across different modalities and clinical scenarios. The development of foundation models in this field has been constrained by the scarcity of large-scale, curated, and open-source musculoskeletal datasets. To address these challenges, we introduce OrthoFoundation, a multimodal vision foundation model optimized for musculoskeletal pathology. We constructed a pre-training dataset of 1.2 million unlabeled knee X-ray and MRI images from internal and public databases. Utilizing a Dinov3 backbone, the model was trained via self-supervised contrastive learning to capture robust radiological representations. OrthoFoundation achieves state-of-the-art (SOTA) performance across 14 downstream tasks. It attained superior accuracy in X-ray osteoarthritis diagnosis and ranked first in MRI structural injury detection. The model demonstrated remarkable label efficiency, matching supervised baselines using only 50% of labeled data. Furthermore, despite being pre-trained on knee images, OrthoFoundation exhibited exceptional cross-anatomy generalization to the hip, shoulder, and ankle. OrthoFoundation represents a significant advancement toward general-purpose AI for musculoskeletal imaging. By learning fundamental, joint-agnostic radiological semantics from large-scale multimodal data, it overcomes the limitations of conventional models, which provides a robust framework for reducing annotation burdens and enhancing diagnostic accuracy in clinical practice.
Related papers
- OrthoDiffusion: A Generalizable Multi-Task Diffusion Foundation Model for Musculoskeletal MRI Interpretation [36.4629764779715]
Musculoskeletal disorders represent a significant global health burden and are a leading cause of disability worldwide.<n>We developed OrthoDiffusion, a unified diffusion-based foundation model designed for multi-task musculoskeletal MRI interpretation.<n>The framework utilizes three orientation-specific 3D diffusion models, pre-trained in a self-supervised manner on 15,948 unlabeled knee MRI scans.
arXiv Detail & Related papers (2026-02-24T10:29:10Z) - A generalizable large-scale foundation model for musculoskeletal radiographs [6.440881664328117]
We present SKELEX, a large-scale foundation model for musculoskeletal radiographs trained using self-supervised learning.<n>The model was evaluated on 12 downstream diagnostic tasks and generally outperformed baselines in fracture detection, osteoarthritis grading, and bone tumor classification.<n>We developed an interpretable, region-guided model for predicting bone tumors, which maintained robust performance on independent external datasets.
arXiv Detail & Related papers (2026-02-03T04:04:45Z) - Self-Supervised Cross-Encoder for Neurodegenerative Disease Diagnosis [6.226851122403944]
We propose a novel self-supervised cross-encoder framework that leverages the temporal continuity in longitudinal MRI scans for supervision.<n>This framework disentangles learned representations into two components: a static representation, constrained by contrastive learning, which captures stable anatomical features; and a dynamic representation, guided by input-gradient regularization, which reflects temporal changes.<n> Experimental results on the Alzheimer's Disease Neuroimaging Initiative dataset demonstrate that our method achieves superior classification accuracy and improved interpretability.
arXiv Detail & Related papers (2025-09-09T11:52:24Z) - Demographic-aware fine-grained classification of pediatric wrist fractures [4.309673738288069]
Computer vision presents a promising avenue, contingent upon the availability of extensive datasets.<n>This study addresses the problem using a multifaceted approach: framing it as a fine-grained recognition task, fusing patient metadata with X-rays, and leveraging weights from a separate fine-grained dataset.<n>Results show that combining fine-grained transformer approach, fine-grained pre-training, and metadata integration improves diagnostic accuracy by 2% on small custom curated dataset and over 10% on a larger fracture dataset.
arXiv Detail & Related papers (2025-07-17T10:03:57Z) - Improving Generalization in MRI-Based Deep Learning Models for Total Knee Replacement Prediction [0.6384218409986929]
We show that replacing batch normalization with instance normalization, using data augmentation, and applying contrastive loss improves generalization.<n>For training and evaluation, we used MRI data from the Osteoarthritis Initiative (OAI) database.
arXiv Detail & Related papers (2025-04-27T11:41:19Z) - A Multi-Site Study on AI-Driven Pathology Detection and Osteoarthritis Grading from Knee X-Ray [0.0]
Bone health disorders like osteoarthritis and osteoporosis pose major global health challenges.<n>This study presents an AI-powered system that analyzes knee X-rays to detect key pathologies.<n>It also grades osteoarthritis severity, enabling timely, personalized treatment.
arXiv Detail & Related papers (2025-03-28T06:41:22Z) - Self-supervised vision-langage alignment of deep learning representations for bone X-rays analysis [53.809054774037214]
This paper proposes leveraging vision-language pretraining on bone X-rays paired with French reports.
It is the first study to integrate French reports to shape the embedding space devoted to bone X-Rays representations.
arXiv Detail & Related papers (2024-05-14T19:53:20Z) - Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning [33.9544297423474]
We present RayDINO, a large visual encoder trained by self-supervision on 873k chest X-rays.
We compare RayDINO to previous state-of-the-art models across nine radiology tasks, from classification and dense segmentation to text generation.
Our findings suggest that self-supervision allows patient-centric AI proving useful in clinical and interpreting X-rays holistically.
arXiv Detail & Related papers (2024-05-02T16:59:10Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - UniBrain: Universal Brain MRI Diagnosis with Hierarchical
Knowledge-enhanced Pre-training [66.16134293168535]
We propose a hierarchical knowledge-enhanced pre-training framework for the universal brain MRI diagnosis, termed as UniBrain.
Specifically, UniBrain leverages a large-scale dataset of 24,770 imaging-report pairs from routine diagnostics.
arXiv Detail & Related papers (2023-09-13T09:22:49Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.