A Real-World Demonstration of Machine Learning Generalizability:
Intracranial Hemorrhage Detection on Head CT
- URL: http://arxiv.org/abs/2102.04869v1
- Date: Tue, 9 Feb 2021 15:05:48 GMT
- Title: A Real-World Demonstration of Machine Learning Generalizability:
Intracranial Hemorrhage Detection on Head CT
- Authors: Hojjat Salehinejad, Jumpei Kitamura, Noah Ditkofsky, Amy Lin, Aditya
Bharatha, Suradech Suthiphosuwan, Hui-Ming Lin, Jefferson R. Wilson, Muhammad
Mamdani, and Errol Colak
- Abstract summary: The purpose of this study was to demonstrate that ML model generalizability is achievable in medical imaging.
An ML model was trained using 21,784 scans from the RSNA Intracranial Hemorrhage CT dataset.
On external validation, the model demonstrated an AUC of 95.4%, sensitivity of 91.3%, and specificity of 94.1%.
- Score: 5.517017976008718
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning (ML) holds great promise in transforming healthcare. While
published studies have shown the utility of ML models in interpreting medical
imaging examinations, these are often evaluated under laboratory settings. The
importance of real world evaluation is best illustrated by case studies that
have documented successes and failures in the translation of these models into
clinical environments. A key prerequisite for the clinical adoption of these
technologies is demonstrating generalizable ML model performance under real
world circumstances. The purpose of this study was to demonstrate that ML model
generalizability is achievable in medical imaging with the detection of
intracranial hemorrhage (ICH) on non-contrast computed tomography (CT) scans
serving as the use case. An ML model was trained using 21,784 scans from the
RSNA Intracranial Hemorrhage CT dataset while generalizability was evaluated
using an external validation dataset obtained from our busy trauma and
neurosurgical center. This real world external validation dataset consisted of
every unenhanced head CT scan (n = 5,965) performed in our emergency department
in 2019 without exclusion. The model demonstrated an AUC of 98.4%, sensitivity
of 98.8%, and specificity of 98.0%, on the test dataset. On external
validation, the model demonstrated an AUC of 95.4%, sensitivity of 91.3%, and
specificity of 94.1%. Evaluating the ML model using a real world external
validation dataset that is temporally and geographically distinct from the
training dataset indicates that ML generalizability is achievable in medical
imaging applications.
Related papers
- Artificial Intelligence-Based Triaging of Cutaneous Melanocytic Lesions [0.8864540224289991]
Pathologists are facing an increasing workload due to a growing volume of cases and the need for more comprehensive diagnoses.
We developed an artificial intelligence (AI) model for triaging cutaneous melanocytic lesions based on whole slide images.
arXiv Detail & Related papers (2024-10-14T13:49:04Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Realism in Action: Anomaly-Aware Diagnosis of Brain Tumors from Medical Images Using YOLOv8 and DeiT [1.024113475677323]
This study addresses the issue by leveraging deep learning (DL) techniques to detect and classify brain tumors in challenging situations.
The curated data set from the National Brain Mapping Lab (NBML) comprises 81 patients, including 30 Tumor cases and 51 Normal cases.
This approach demonstrates promising strides in reliable tumor detection and classification, offering potential advancements in tumor diagnosis for real-world medical imaging scenarios.
arXiv Detail & Related papers (2024-01-06T20:53:02Z) - Performance of externally validated machine learning models based on
histopathology images for the diagnosis, classification, prognosis, or
treatment outcome prediction in female breast cancer: A systematic review [0.5792122879054292]
externally validated machine learning models for diagnosis, classification, prognosis, or treatment outcome prediction in female breast cancer.
Three studies externally validated ML models for diagnosis, 4 for classification, 2 for prognosis, and 1 for both classification and prognosis.
Most studies used Convolutional Neural Networks and one used logistic regression algorithms.
arXiv Detail & Related papers (2023-12-09T18:27:56Z) - Mixed-Integer Projections for Automated Data Correction of EMRs Improve
Predictions of Sepsis among Hospitalized Patients [7.639610349097473]
We introduce an innovative projections-based method that seamlessly integrates clinical expertise as domain constraints.
We measure the distance of corrected data from the constraints defining a healthy range of patient data, resulting in a unique predictive metric we term as "trust-scores"
We show an AUROC of 0.865 and a precision of 0.922, that surpasses conventional ML models without such projections.
arXiv Detail & Related papers (2023-08-21T15:14:49Z) - Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community.
The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored.
We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z) - Federated Learning Enables Big Data for Rare Cancer Boundary Detection [98.5549882883963]
We present findings from the largest Federated ML study to-date, involving data from 71 healthcare institutions across 6 continents.
We generate an automatic tumor boundary detector for the rare disease of glioblastoma.
We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent.
arXiv Detail & Related papers (2022-04-22T17:27:00Z) - Virtual vs. Reality: External Validation of COVID-19 Classifiers using
XCAT Phantoms for Chest Computed Tomography [2.924350993741562]
We created the CVIT-COVID dataset including 180 virtually imaged computed tomography (CT) images from simulated COVID-19 and normal phantom models.
We evaluated the performance of an open-source, deep-learning model from the University of Waterloo trained with multi-institutional data.
We validated the model's performance against open clinical data of 305 CT images to understand virtual vs. real clinical data performance.
arXiv Detail & Related papers (2022-03-07T00:11:53Z) - Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in
Artificial Intelligence [79.038671794961]
We launch the Unified CT-COVID AI Diagnostic Initiative (UCADI), where the AI model can be distributedly trained and independently executed at each host institution.
Our study is based on 9,573 chest computed tomography scans (CTs) from 3,336 patients collected from 23 hospitals located in China and the UK.
arXiv Detail & Related papers (2021-11-18T00:43:41Z) - Deep learning-based COVID-19 pneumonia classification using chest CT
images: model generalizability [54.86482395312936]
Deep learning (DL) classification models were trained to identify COVID-19-positive patients on 3D computed tomography (CT) datasets from different countries.
We trained nine identical DL-based classification models by using combinations of the datasets with a 72% train, 8% validation, and 20% test data split.
The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better.
arXiv Detail & Related papers (2021-02-18T21:14:52Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.