Text-Guided Multi-Instance Learning for Scoliosis Screening via Gait Video Analysis
- URL: http://arxiv.org/abs/2507.02996v1
- Date: Tue, 01 Jul 2025 22:13:27 GMT
- Title: Text-Guided Multi-Instance Learning for Scoliosis Screening via Gait Video Analysis
- Authors: Haiqing Li, Yuzhi Guo, Feng Jiang, Thao M. Dang, Hehuan Ma, Qifeng Zhou, Jean Gao, Junzhou Huang,
- Abstract summary: Early-stage scoliosis is difficult to detect, particularly in adolescents, where delayed diagnosis can lead to serious health issues.<n>Traditional X-ray-based methods carry radiation risks and rely heavily on clinical expertise, limiting their use in large-scale screenings.<n>We propose a Text-Guided Multi-Instance Learning Network (TG-MILNet) for non-invasive scoliosis detection using gait videos.
- Score: 33.88520129574637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Early-stage scoliosis is often difficult to detect, particularly in adolescents, where delayed diagnosis can lead to serious health issues. Traditional X-ray-based methods carry radiation risks and rely heavily on clinical expertise, limiting their use in large-scale screenings. To overcome these challenges, we propose a Text-Guided Multi-Instance Learning Network (TG-MILNet) for non-invasive scoliosis detection using gait videos. To handle temporal misalignment in gait sequences, we employ Dynamic Time Warping (DTW) clustering to segment videos into key gait phases. To focus on the most relevant diagnostic features, we introduce an Inter-Bag Temporal Attention (IBTA) mechanism that highlights critical gait phases. Recognizing the difficulty in identifying borderline cases, we design a Boundary-Aware Model (BAM) to improve sensitivity to subtle spinal deviations. Additionally, we incorporate textual guidance from domain experts and large language models (LLM) to enhance feature representation and improve model interpretability. Experiments on the large-scale Scoliosis1K gait dataset show that TG-MILNet achieves state-of-the-art performance, particularly excelling in handling class imbalance and accurately detecting challenging borderline cases. The code is available at https://github.com/lhqqq/TG-MILNet
Related papers
- Hide and Seek with LLMs: An Adversarial Game for Sneaky Error Generation and Self-Improving Diagnosis [51.88592148135258]
We propose Hide and Seek Game (HSG), a dynamic adversarial framework for error generation and diagnosis.<n>HSG involves two adversarial roles: Sneaky, which "hides" by generating subtle, deceptive reasoning errors, and Diagnosis, which "seeks" to accurately detect them.<n> Experiments on several math reasoning tasks show that HSG significantly boosts error diagnosis, achieving 16.8%--31.4% higher accuracy than baselines like GPT-4o.
arXiv Detail & Related papers (2025-08-05T12:45:21Z) - Demographic-aware fine-grained classification of pediatric wrist fractures [3.4384440967420185]
Wrist pathologies are frequently observed, particularly among children who constitute the majority of fracture cases.<n>Computer vision presents a promising avenue, contingent upon the availability of extensive datasets.<n>We employ a multifaceted approach to address the challenge of recognizing wrist pathologies using an extremely limited dataset.
arXiv Detail & Related papers (2025-07-17T10:03:57Z) - Leveraging Gait Patterns as Biomarkers: An attention-guided Deep Multiple Instance Learning Network for Scoliosis Classification [36.18242379097044]
Scoliosis is a spinal curvature disorder that is difficult to detect early and can compress the chest cavity.<n>Traditional scoliosis detection methods rely on clinical expertise, and X-ray imaging poses radiation risks.<n>We propose an Attention-Guided Deep Multi-Instance Learning method (Gait-MIL) to effectively capture discriminative features from gait patterns.
arXiv Detail & Related papers (2025-04-04T19:35:33Z) - Pathological Prior-Guided Multiple Instance Learning For Mitigating Catastrophic Forgetting in Breast Cancer Whole Slide Image Classification [50.899861205016265]
We propose a new framework PaGMIL to mitigate catastrophic forgetting in breast cancer WSI classification.<n>Our framework introduces two key components into the common MIL model architecture.<n>We evaluate the continual learning performance of PaGMIL across several public breast cancer datasets.
arXiv Detail & Related papers (2025-03-08T04:51:58Z) - Gait Patterns as Biomarkers: A Video-Based Approach for Classifying Scoliosis [10.335383345968966]
Scoliosis presents significant diagnostic challenges, particularly in adolescents.
Traditional diagnostic and follow-up methods face limitations due to the need for clinical expertise and the risk of radiation exposure.
We introduce a novel video-based, non-invasive method for scoliosis classification using gait analysis.
arXiv Detail & Related papers (2024-07-08T08:29:02Z) - Diffusion Models with Ensembled Structure-Based Anomaly Scoring for Unsupervised Anomaly Detection [35.46541584018842]
unsupervised anomaly detection (UAD) emerges as a viable alternative for pathology segmentation.
Recent UAD anomaly scoring functions often focus on intensity only and neglect structural differences, which impedes the segmentation performance.
Structural Similarity (SSIM) captures both intensity and structural disparities and can be advantageous over the classical $l1$ error.
arXiv Detail & Related papers (2024-03-21T09:50:39Z) - Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features.
We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z) - Shape Matters: Detecting Vertebral Fractures Using Differentiable
Point-Based Shape Decoding [51.38395069380457]
Degenerative spinal pathologies are highly prevalent among the elderly population.
Timely diagnosis of osteoporotic fractures and other degenerative deformities facilitates proactive measures to mitigate the risk of severe back pain and disability.
In this study, we specifically explore the use of shape auto-encoders for vertebrae.
arXiv Detail & Related papers (2023-12-08T18:11:22Z) - Deep Reinforcement Learning Framework for Thoracic Diseases
Classification via Prior Knowledge Guidance [49.87607548975686]
The scarcity of labeled data for related diseases poses a huge challenge to an accurate diagnosis.
We propose a novel deep reinforcement learning framework, which introduces prior knowledge to direct the learning of diagnostic agents.
Our approach's performance was demonstrated using the well-known NIHX-ray 14 and CheXpert datasets.
arXiv Detail & Related papers (2023-06-02T01:46:31Z) - A Global and Patch-wise Contrastive Loss for Accurate Automated Exudate
Detection [12.669734891001667]
Diabetic retinopathy (DR) is a leading global cause of blindness.
Early detection of hard exudates plays a crucial role in identifying DR, which aids in treating diabetes and preventing vision loss.
We present a novel supervised contrastive learning framework to optimize hard exudate segmentation.
arXiv Detail & Related papers (2023-02-22T17:39:00Z) - FetReg: Placental Vessel Segmentation and Registration in Fetoscopy
Challenge Dataset [57.30136148318641]
Fetoscopy laser photocoagulation is a widely used procedure for the treatment of Twin-to-Twin Transfusion Syndrome (TTTS)
This may lead to increased procedural time and incomplete ablation, resulting in persistent TTTS.
Computer-assisted intervention may help overcome these challenges by expanding the fetoscopic field of view through video mosaicking and providing better visualization of the vessel network.
We present a large-scale multi-centre dataset for the development of generalized and robust semantic segmentation and video mosaicking algorithms for the fetal environment with a focus on creating drift-free mosaics from long duration fetoscopy videos.
arXiv Detail & Related papers (2021-06-10T17:14:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.