Related papers: Clinical-Prior Guided Multi-Modal Learning with Latent Attention Pooling for Gait-Based Scoliosis Screening

Clinical-Prior Guided Multi-Modal Learning with Latent Attention Pooling for Gait-Based Scoliosis Screening

URL: http://arxiv.org/abs/2602.06743v1
Date: Fri, 06 Feb 2026 14:44:22 GMT
Title: Clinical-Prior Guided Multi-Modal Learning with Latent Attention Pooling for Gait-Based Scoliosis Screening
Authors: Dong Chen, Zizhuang Wei, Jialei Xu, Xinyang Sun, Zonglin He, Meiru An, Huili Peng, Yong Hu, Kenneth MC Cheung,
Abstract summary: Adolescent Idiopathic Scoliosis (AIS) is a prevalent spinal deformity whose progression can be mitigated through early detection.<n>Current screening methods are subjective, difficult to scale, and reliant on specialized clinical expertise.<n>Video-based gait analysis offers a promising alternative, but current datasets and methods frequently suffer from data leakage.<n>ScoliGait is a new benchmark dataset comprising 1,572 gait video clips for training and 300 fully independent clips for testing.
Score: 8.010714901985898
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adolescent Idiopathic Scoliosis (AIS) is a prevalent spinal deformity whose progression can be mitigated through early detection. Conventional screening methods are often subjective, difficult to scale, and reliant on specialized clinical expertise. Video-based gait analysis offers a promising alternative, but current datasets and methods frequently suffer from data leakage, where performance is inflated by repeated clips from the same individual, or employ oversimplified models that lack clinical interpretability. To address these limitations, we introduce ScoliGait, a new benchmark dataset comprising 1,572 gait video clips for training and 300 fully independent clips for testing. Each clip is annotated with radiographic Cobb angles and descriptive text based on clinical kinematic priors. We propose a multi-modal framework that integrates a clinical-prior-guided kinematic knowledge map for interpretable feature representation, alongside a latent attention pooling mechanism to fuse video, text, and knowledge map modalities. Our method establishes a new state-of-the-art, demonstrating a significant performance gap on a realistic, non-repeating subject benchmark. Our approach establishes a new state of the art, showing a significant performance gain on a realistic, subject-independent benchmark. This work provides a robust, interpretable, and clinically grounded foundation for scalable, non-invasive AIS assessment.

Related papers

A benchmark for video-based laparoscopic skill analysis and assessment [1.5734501497837607]
We introduce the Laparoscopic Skill Analysis and Assessment dataset, comprising 1270 stereo video recordings of four basic laparoscopic training tasks.<n>Each recording is annotated with a structured skill rating, aggregated from three independent raters, as well as binary labels indicating the presence or absence of task-specific errors.<n>To facilitate benchmarking of both existing and novel approaches for video-based skill assessment and error recognition, we provide predefined data splits for each task.
arXiv Detail & Related papers (2026-02-10T15:59:19Z)
Dense Feature Learning via Linear Structure Preservation in Medical Data [30.77691570199694]
Deep learning models for medical data are typically trained using task specific objectives that encourage representations to collapse onto a small number of discriminative directions.<n>We propose dense feature learning, a representation centric framework that explicitly shapes the linear structure of medical embeddings.
arXiv Detail & Related papers (2026-02-07T21:23:35Z)
Multi-View Stenosis Classification Leveraging Transformer-Based Multiple-Instance Learning Using Real-World Clinical Data [76.89269238957593]
Coronary artery stenosis is a leading cause of cardiovascular disease, diagnosed by analyzing the coronary arteries from multiple angiography views.<n>We propose SegmentMIL, a transformer-based multi-view multiple-instance learning framework for patient-level stenosis classification.
arXiv Detail & Related papers (2026-02-02T13:07:52Z)
Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report Generation [61.350584471060756]
Vision-grounded medical report generation aims to produce clinically accurate descriptions of medical images.<n>We propose Self-Supervised Anatomical Consistency Learning (SS-ACL) to align generated reports with corresponding anatomical regions.<n>SS-ACL constructs a hierarchical anatomical graph inspired by the invariant top-down inclusion structure of human anatomy.
arXiv Detail & Related papers (2025-09-30T08:59:06Z)
Revolutionizing Precise Low Back Pain Diagnosis via Contrastive Learning [0.3499870393443268]
Low back pain affects millions worldwide, driving the need for robust diagnostic models.<n>We present LumbarCLIP, a novel framework that leverages contrastive language-image pretraining to align lumbar spine MRI scans with corresponding radiological descriptions.
arXiv Detail & Related papers (2025-09-25T06:52:25Z)
GEMeX-RMCoT: An Enhanced Med-VQA Dataset for Region-Aware Multimodal Chain-of-Thought Reasoning [60.03671205298294]
Medical visual question answering aims to support clinical decision-making by enabling models to answer natural language questions based on medical images.<n>Current methods still suffer from limited answer reliability and poor interpretability.<n>This work first proposes a Region-Aware Multimodal Chain-of-Thought dataset, in which the process of producing an answer is preceded by a sequence of intermediate reasoning steps.
arXiv Detail & Related papers (2025-06-22T08:09:58Z)
Adversarial Vessel-Unveiling Semi-Supervised Segmentation for Retinopathy of Prematurity Diagnosis [9.683492465191241]
We propose a semi supervised segmentation framework designed to advance ROP studies without the need for extensive manual vessel annotation. Unlike previous methods that rely solely on limited labeled data, our approach integrates uncertainty weighted vessel unveiling module and domain adversarial learning. We validate our approach on public datasets and an in-house ROP dataset, demonstrating its superior performance across multiple evaluation metrics.
arXiv Detail & Related papers (2024-11-14T02:40:34Z)
Show from Tell: Audio-Visual Modelling in Clinical Settings [58.88175583465277]
We consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations without human expert annotation. A simple yet effective multi-modal self-supervised learning framework is proposed for this purpose. The proposed approach is able to localise anatomical regions of interest during ultrasound imaging, with only speech audio as a reference.
arXiv Detail & Related papers (2023-10-25T08:55:48Z)
Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites: A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area. We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions. We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z)
KIDS: kinematics-based (in)activity detection and segmentation in a sleep case study [5.707737640557724]
Sleep behaviour and in-bed movements contain rich information on the neurophysiological health of people. This paper proposes an online Bayesian probabilistic framework for objective (in)activity detection and segmentation based on clinically meaningful joint kinematics.
arXiv Detail & Related papers (2023-01-04T16:24:01Z)
LifeLonger: A Benchmark for Continual Disease Classification [59.13735398630546]
We introduce LifeLonger, a benchmark for continual disease classification on the MedMNIST collection. Task and class incremental learning of diseases address the issue of classifying new samples without re-training the models from scratch. Cross-domain incremental learning addresses the issue of dealing with datasets originating from different institutions while retaining the previously obtained knowledge.
arXiv Detail & Related papers (2022-04-12T12:25:05Z)
PS-DeVCEM: Pathology-sensitive deep learning model for video capsule endoscopy based on weakly labeled data [0.0]
We propose a pathology-sensitive deep learning model (PS-DeVCEM) for frame-level anomaly detection and multi-label classification of different colon diseases in video capsule endoscopy (VCE) data. Our model is driven by attention-based deep multiple instance learning and is trained end-to-end on weakly labeled data. We show our model's ability to temporally localize frames with pathologies, without frame annotation information during training.
arXiv Detail & Related papers (2020-11-22T15:33:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.