VL-OrdinalFormer: Vision Language Guided Ordinal Transformers for Interpretable Knee Osteoarthritis Grading
- URL: http://arxiv.org/abs/2601.00879v1
- Date: Wed, 31 Dec 2025 03:01:31 GMT
- Title: VL-OrdinalFormer: Vision Language Guided Ordinal Transformers for Interpretable Knee Osteoarthritis Grading
- Authors: Zahid Ullah, Jihie Kim,
- Abstract summary: VLOrdinalFormer is a vision language guided ordinal learning framework for automated KOA grading from knee radiographs.<n>The proposed method combines a ViT L16 backbone with CORAL based ordinal regression and a Contrastive Language Image Pretraining (CLIP) driven semantic alignment module.<n>Experiments conducted on the publicly available OAI kneeKL224 dataset demonstrate that VLOrdinalFormer achieves state of the art performance.
- Score: 6.106307107513728
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knee osteoarthritis (KOA) is a leading cause of disability worldwide, and accurate severity assessment using the Kellgren Lawrence (KL) grading system is critical for clinical decision making. However, radiographic distinctions between early disease stages, particularly KL1 and KL2, are subtle and frequently lead to inter-observer variability among radiologists. To address these challenges, we propose VLOrdinalFormer, a vision language guided ordinal learning framework for fully automated KOA grading from knee radiographs. The proposed method combines a ViT L16 backbone with CORAL based ordinal regression and a Contrastive Language Image Pretraining (CLIP) driven semantic alignment module, allowing the model to incorporate clinically meaningful textual concepts related to joint space narrowing, osteophyte formation, and subchondral sclerosis. To improve robustness and mitigate overfitting, we employ stratified five fold cross validation, class aware re weighting to emphasize challenging intermediate grades, and test time augmentation with global threshold optimization. Experiments conducted on the publicly available OAI kneeKL224 dataset demonstrate that VLOrdinalFormer achieves state of the art performance, outperforming CNN and ViT baselines in terms of macro F1 score and overall accuracy. Notably, the proposed framework yields substantial performance gains for KL1 and KL2 without compromising classification accuracy for mild or severe cases. In addition, interpretability analyses using Grad CAM and CLIP similarity maps confirm that the model consistently attends to clinically relevant anatomical regions. These results highlight the potential of vision language aligned ordinal transformers as reliable and interpretable tools for KOA grading and disease progression assessment in routine radiological practice.
Related papers
- Retrieving Patient-Specific Radiomic Feature Sets for Transparent Knee MRI Assessment [42.97456036889799]
Classical radiomic features are designed to quantify image appearance and intensity patterns.<n>Recent work on adaptive radiomics uses DL to predict feature weights over a radiomic pool, then thresholds these weights to retain the top-k features from large radiomic pool F.<n>We propose a patient-specific feature-set selection framework that predicts a single compact feature set per subject.
arXiv Detail & Related papers (2026-03-02T20:12:41Z) - Suppressing Prior-Comparison Hallucinations in Radiology Report Generation via Semantically Decoupled Latent Steering [94.37535002230504]
We develop a training-free, inference-time control framework termed Semantically Decoupled Latent Steering.<n>Our approach constructs a semantic-free intervention vector via large language model (LLM)-driven semantic decomposition.<n>We show that our approach significantly reduces the probability of historical hallucinations.
arXiv Detail & Related papers (2026-02-27T04:49:01Z) - Multi-View Stenosis Classification Leveraging Transformer-Based Multiple-Instance Learning Using Real-World Clinical Data [76.89269238957593]
Coronary artery stenosis is a leading cause of cardiovascular disease, diagnosed by analyzing the coronary arteries from multiple angiography views.<n>We propose SegmentMIL, a transformer-based multi-view multiple-instance learning framework for patient-level stenosis classification.
arXiv Detail & Related papers (2026-02-02T13:07:52Z) - A Graph-Augmented knowledge Distillation based Dual-Stream Vision Transformer with Region-Aware Attention for Gastrointestinal Disease Classification with Explainable AI [0.06372261626436675]
This study presents a hybrid dual-stream deep learning framework built on teacher-student knowledge distillation.<n>A student network was implemented as a compact Tiny-ViT structure that inherits the teacher's semantic and morphological knowledge.<n>Two carefully curated Wireless Capsule Endoscopy datasets, encompassing major GI disease classes, were employed to ensure balanced representation.
arXiv Detail & Related papers (2025-12-24T07:51:54Z) - TWLR: Text-Guided Weakly-Supervised Lesion Localization and Severity Regression for Explainable Diabetic Retinopathy Grading [9.839282449612513]
We propose TWLR, a two-stage framework for interpretable diabetic retinopathy (DR) assessment.<n>In the first stage, a vision-supervised model integrates domain-specific ophthalmological knowledge into text embeddings to jointly perform DR grading and lesion classification.<n>The second stage introduces an iterative severity regression framework based on weakly-language semantic segmentation.
arXiv Detail & Related papers (2025-12-15T06:08:16Z) - Stacked Ensemble of Fine-Tuned CNNs for Knee Osteoarthritis Severity Grading [4.278354829803626]
Knee Osteoarthritis (KOA) is a musculoskeletal condition that can cause significant limitations and impairments in daily activities.<n>To evaluate KOA, X-ray images of the affected knee are analyzed, and a grade is assigned based on the Kellgren-Lawrence (KL) grading system.<n>A stacked ensemble model of fine-tuned Convolutional Neural Networks (CNNs) was developed for two classification tasks.
arXiv Detail & Related papers (2025-11-27T06:20:09Z) - MeCaMIL: Causality-Aware Multiple Instance Learning for Fair and Interpretable Whole Slide Image Diagnosis [40.3028468133626]
Multiple instance learning (MIL) has emerged as the dominant paradigm for whole slide image (WSI) analysis in computational pathology.<n>textbfMeCaMIL, a causality-aware MIL framework, explicitly models demographic confounders through structured causal graphs.<n>MeCaMIL achieves superior fairness -- demographic disparity variance drops by over 65% relative reduction on average across attributes.
arXiv Detail & Related papers (2025-11-14T06:47:21Z) - EndoCIL: A Class-Incremental Learning Framework for Endoscopic Image Classification [5.574295682041076]
Class-incremental learning (CIL) for endoscopic image analysis is crucial for real-world clinical applications.<n>We propose EndoCIL, a novel and unified CIL framework specifically tailored for endoscopic image diagnosis.
arXiv Detail & Related papers (2025-10-20T06:26:54Z) - A Foundation Model for Chest X-ray Interpretation with Grounded Reasoning via Online Reinforcement Learning [41.27625400846057]
DeepMedix-R1 is a holistic medical FM for chest X-ray (CXR) interpretation.<n>It generates both an answer and reasoning steps tied to the image's local regions for each query.
arXiv Detail & Related papers (2025-09-04T06:00:04Z) - Fairness Evolution in Continual Learning for Medical Imaging [47.52603262576663]
This study examines how bias evolves across tasks using domain-specific fairness metrics and how different CL strategies impact this evolution.<n>Our results show that Learning without Forgetting and Pseudo-Label achieve optimal classification performance, but Pseudo-Label is less biased.
arXiv Detail & Related papers (2024-04-10T09:48:52Z) - Automatic diagnosis of knee osteoarthritis severity using Swin
transformer [55.01037422579516]
Knee osteoarthritis (KOA) is a widespread condition that can cause chronic pain and stiffness in the knee joint.
We propose an automated approach that employs the Swin Transformer to predict the severity of KOA.
arXiv Detail & Related papers (2023-07-10T09:49:30Z) - Confidence-Driven Deep Learning Framework for Early Detection of Knee Osteoarthritis [8.193689534916988]
Knee Osteoarthritis (KOA) is a prevalent musculoskeletal disorder that severely impacts mobility and quality of life.<n>We propose a confidence-driven deep learning framework for early KOA detection, focusing on distinguishing KL-0 and KL-2 stages.<n> Experimental results demonstrate that the proposed framework achieves competitive accuracy, sensitivity, and specificity, comparable to those of expert radiologists.
arXiv Detail & Related papers (2023-03-23T11:57:50Z) - Learning to diagnose cirrhosis from radiological and histological labels
with joint self and weakly-supervised pretraining strategies [62.840338941861134]
We propose to leverage transfer learning from large datasets annotated by radiologists, to predict the histological score available on a small annex dataset.
We compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis.
This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75.
arXiv Detail & Related papers (2023-02-16T17:06:23Z) - Rethinking Semi-Supervised Medical Image Segmentation: A
Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation.
We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks.
We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.