Related papers: Multi-Centre Validation of a Deep Learning Model for Scoliosis Assessment

Multi-Centre Validation of a Deep Learning Model for Scoliosis Assessment

URL: http://arxiv.org/abs/2507.14093v1
Date: Fri, 18 Jul 2025 17:21:53 GMT
Title: Multi-Centre Validation of a Deep Learning Model for Scoliosis Assessment
Authors: Šimon Kubov, Simon Klíčník, Jakub Dandár, Zdeněk Straka, Karolína Kvaková, Daniel Kvak,
Abstract summary: We conducted a retrospective, multi centre evaluation of a fully automated deep learning software (Carebot AI Bones, Spine Measurement functionality; Carebot s.r.o.)<n>On 103 standing anteroposterior whole spine radiographs collected from ten hospitals.<n>Two musculoskeletal radiologists independently measured each study and served as reference readers.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scoliosis affects roughly 2 to 4 percent of adolescents, and treatment decisions depend on precise Cobb angle measurement. Manual assessment is time consuming and subject to inter observer variation. We conducted a retrospective, multi centre evaluation of a fully automated deep learning software (Carebot AI Bones, Spine Measurement functionality; Carebot s.r.o.) on 103 standing anteroposterior whole spine radiographs collected from ten hospitals. Two musculoskeletal radiologists independently measured each study and served as reference readers. Agreement between the AI and each radiologist was assessed with Bland Altman analysis, mean absolute error (MAE), root mean squared error (RMSE), Pearson correlation coefficient, and Cohen kappa for four grade severity classification. Against Radiologist 1 the AI achieved an MAE of 3.89 degrees (RMSE 4.77 degrees) with a bias of 0.70 degrees and limits of agreement from minus 8.59 to plus 9.99 degrees. Against Radiologist 2 the AI achieved an MAE of 3.90 degrees (RMSE 5.68 degrees) with a bias of 2.14 degrees and limits from minus 8.23 to plus 12.50 degrees. Pearson correlations were r equals 0.906 and r equals 0.880 (inter reader r equals 0.928), while Cohen kappa for severity grading reached 0.51 and 0.64 (inter reader kappa 0.59). These results demonstrate that the proposed software reproduces expert level Cobb angle measurements and categorical grading across multiple centres, suggesting its utility for streamlining scoliosis reporting and triage in clinical workflows.

Related papers

PanCanBench: A Comprehensive Benchmark for Evaluating Large Language Models in Pancreatic Oncology [48.732366302949515]
Large language models (LLMs) have achieved expert-level performance on standardized examinations, yet multiple-choice accuracy poorly reflects real-world clinical utility and safety.<n>We developed a human-in-the-loop pipeline to create expert rubrics for de-identified patient questions.<n>We evaluated 22 proprietary and open-source LLMs using an LLM-as-a-judge framework, measuring clinical completeness, factual accuracy, and web-search integration.
arXiv Detail & Related papers (2026-03-02T00:50:39Z)
Automated glenoid bone loss measurement and segmentation in CT scans for pre-operative planning in shoulder instability [4.618498494409548]
Reliable measurement of glenoid bone loss is essential for operative planning in shoulder instability.<n>We developed and validated a fully automated deep learning pipeline for measuring glenoid bone loss on three-dimensional computed tomography (CT) scans.
arXiv Detail & Related papers (2025-11-18T03:12:22Z)
Validation of a CT-brain analysis tool for measuring global cortical atrophy in older patient cohorts [0.7223361655030193]
We validated our automated deep learning (DL) tool measuring the Global Cerebral Atrophy score against trained human raters.<n>DL tool measured GCA score against trained human cognitive raters and associations with age impairment, in representative older (65 years) patients.
arXiv Detail & Related papers (2025-09-08T20:04:35Z)
Deep Radiomics Detection of Clinically Significant Prostate Cancer on Multicenter MRI: Initial Comparison to PI-RADS Assessment [0.0]
This study analyzed biparametric (T2W and DW) prostate MRI sequences of 615 patients (mean age, 63.1 +/- 7 years) from four datasets acquired between 2010 and 2020. Deep radiomics machine learning model achieved comparable performance to PI-RADS assessment in csPCa detection at the patient-level but not at the lesion-level.
arXiv Detail & Related papers (2024-10-21T17:41:58Z)
Deep Learning Segmentation of Ascites on Abdominal CT Scans for Automatic Volume Quantification [12.25110399510034]
This retrospective study included contrast-enhanced and non-contrast abdominal-pelvic CT scans of patients with cirrhotic ascites and patients with ovarian cancer. The model was trained on The Cancer Genome Atlas Ovarian Cancer dataset (mean age, 60 years +/- 11 [s.d.]; 143 female) Its performance was measured by the Dice coefficient, standard deviations, and 95% confidence intervals, focusing on ascites volume in the peritoneal cavity.
arXiv Detail & Related papers (2024-06-23T01:32:53Z)
Detection of subclinical atherosclerosis by image-based deep learning on chest x-ray [86.38767955626179]
Deep-learning algorithm to predict coronary artery calcium (CAC) score was developed on 460 chest x-ray. The diagnostic accuracy of the AICAC model assessed by the area under the curve (AUC) was the primary outcome.
arXiv Detail & Related papers (2024-03-27T16:56:14Z)
Deep learning automates Cobb angle measurement compared with multi-expert observers [3.7153471185088427]
The Cobb angle is a widely used scoliosis quantification method that measures the degree of curvature between the tilted vertebrae. We have created fully automated software that precisely measures the Cobb angle and provides clear visualizations of these measurements. This software integrates deep neural network-based spine region detection and segmentation, spine centerline identification, pinpointing the most significantly tilted vertebrae.
arXiv Detail & Related papers (2024-03-18T15:43:45Z)
Can Deep Learning Reliably Recognize Abnormality Patterns on Chest X-rays? A Multi-Reader Study Examining One Month of AI Implementation in Everyday Radiology Clinical Practice [0.0]
We developed a deep-learning-based automatic detection algorithm (DLAD) to detect and localize seven specific radiological findings on chest X-rays. The proposed DLAD achieved high sensitivity (ATE 1.000 (0.624-1.000), CON 0.864 (0.671-0.956), EFF 0.953 (0.887-0.983), LES 0.905 (0.715-0.978), SCE 1.000 (0.366-1.000), CMG 0.837 (0.711-0.917), PNO 0.875 (0.538-0.986) The findings of the study demonstrate that the suggested DLAD holds potential for integration into everyday
arXiv Detail & Related papers (2023-05-17T10:43:50Z)
Attention-based Saliency Maps Improve Interpretability of Pneumothorax Classification [52.77024349608834]
To investigate chest radiograph (CXR) classification performance of vision transformers (ViT) and interpretability of attention-based saliency. ViTs were fine-tuned for lung disease classification using four public data sets: CheXpert, Chest X-Ray 14, MIMIC CXR, and VinBigData. ViTs had comparable CXR classification AUCs compared with state-of-the-art CNNs.
arXiv Detail & Related papers (2023-03-03T12:05:41Z)
Learning to diagnose cirrhosis from radiological and histological labels with joint self and weakly-supervised pretraining strategies [62.840338941861134]
We propose to leverage transfer learning from large datasets annotated by radiologists, to predict the histological score available on a small annex dataset. We compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis. This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75.
arXiv Detail & Related papers (2023-02-16T17:06:23Z)
Osteoporosis Prescreening using Panoramic Radiographs through a Deep Convolutional Neural Network with Attention Mechanism [65.70943212672023]
Deep convolutional neural network (CNN) with an attention module can detect osteoporosis on panoramic radiographs. dataset of 70 panoramic radiographs (PRs) from 70 different subjects of age between 49 to 60 was used.
arXiv Detail & Related papers (2021-10-19T00:03:57Z)
Classification of Multiple Diseases on Body CT Scans using Weakly Supervised Deep Learning [7.287303475865695]
Rule-based algorithms were used to extract 19,225 disease labels from 13,667 body CT scans from 12,092 patients. For each organ, a three-dimensional convolutional neural network classified no apparent disease versus four common diseases for a total of 15 different labels. Results: Manual validation of the extracted labels confirmed 91% to 99% accuracy across the 15 different labels.
arXiv Detail & Related papers (2020-08-03T19:55:53Z)
Automated Quantification of CT Patterns Associated with COVID-19 from Chest CT [48.785596536318884]
The proposed method takes as input a non-contrasted chest CT and segments the lesions, lungs, and lobes in three dimensions. The method outputs two combined measures of the severity of lung and lobe involvement, quantifying both the extent of COVID-19 abnormalities and presence of high opacities. Evaluation of the algorithm is reported on CTs of 200 participants (100 COVID-19 confirmed patients and 100 healthy controls) from institutions from Canada, Europe and the United States.
arXiv Detail & Related papers (2020-04-02T21:49:14Z)
Machine-Learning-Based Multiple Abnormality Prediction with Large-Scale Chest Computed Tomography Volumes [64.21642241351857]
We curated and analyzed a chest computed tomography (CT) data set of 36,316 volumes from 19,993 unique patients. We developed a rule-based method for automatically extracting abnormality labels from free-text radiology reports. We also developed a model for multi-organ, multi-disease classification of chest CT volumes.
arXiv Detail & Related papers (2020-02-12T00:59:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.