Related papers: Toward Clinically Trustworthy Deep Learning: Applying Conformal Prediction to Intracranial Hemorrhage Detection

Toward Clinically Trustworthy Deep Learning: Applying Conformal Prediction to Intracranial Hemorrhage Detection

URL: http://arxiv.org/abs/2401.08058v1
Date: Tue, 16 Jan 2024 02:26:29 GMT
Title: Toward Clinically Trustworthy Deep Learning: Applying Conformal Prediction to Intracranial Hemorrhage Detection
Authors: Cooper Gamble, Shahriar Faghani, Bradley J. Erickson
Abstract summary: This study is a retrospective study of 491 non-contrast head CTs from the CQ500 dataset, in which three senior radiologists annotated slices containing intracranial hemorrhage (ICH) A DL model was trained on 146 patients (10,815 slices) from the definite data (training dataset) to perform ICH localization and classification for five classes of ICH. The uncertainty-aware DL model was tested on 8,401 definite and challenging cases to assess its ability to identify challenging cases.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As deep learning (DL) continues to demonstrate its ability in radiological tasks, it is critical that we optimize clinical DL solutions to include safety. One of the principal concerns in the clinical adoption of DL tools is trust. This study aims to apply conformal prediction as a step toward trustworthiness for DL in radiology. This is a retrospective study of 491 non-contrast head CTs from the CQ500 dataset, in which three senior radiologists annotated slices containing intracranial hemorrhage (ICH). The dataset was split into definite and challenging subsets, where challenging images were defined to those in which there was disagreement among readers. A DL model was trained on 146 patients (10,815 slices) from the definite data (training dataset) to perform ICH localization and classification for five classes of ICH. To develop an uncertainty-aware DL model, 1,546 cases of the definite data (calibration dataset) was used for Mondrian conformal prediction (MCP). The uncertainty-aware DL model was tested on 8,401 definite and challenging cases to assess its ability to identify challenging cases. After the MCP procedure, the model achieved an F1 score of 0.920 for ICH classification on the test dataset. Additionally, it correctly identified 6,837 of the 6,856 total challenging cases as challenging (99.7% accuracy). It did not incorrectly label any definite cases as challenging. The uncertainty-aware ICH detector performs on par with state-of-the-art models. MCP's performance in detecting challenging cases demonstrates that it is useful in automated ICH detection and promising for trustworthiness in radiological DL.

Related papers

Latent Space Class Dispersion: Effective Test Data Quality Assessment for DNNs [45.129846925131055]
Latent Space Class Dispersion (LSCD) is a novel metric to quantify the quality of test datasets for Deep Neural Networks (DNNs)<n>Our empirical study shows that LSCD reveals and quantifies deficiencies in the test dataset of three popular benchmarks pertaining to image classification tasks.
arXiv Detail & Related papers (2025-03-24T15:45:50Z)
An Intrinsically Explainable Approach to Detecting Vertebral Compression Fractures in CT Scans via Neurosymbolic Modeling [9.108675519106319]
Vertebral compression fractures (VCFs) are a common and potentially serious consequence of osteoporosis. In high-stakes scenarios like opportunistic medical diagnosis, model interpretability is a key factor for the adoption of AI recommendations. We introduce a neurosymbolic approach for VCF detection in CT volumes.
arXiv Detail & Related papers (2024-12-23T04:01:44Z)
Fairness Evolution in Continual Learning for Medical Imaging [47.52603262576663]
We study the behavior of Continual Learning (CL) strategies in medical imaging regarding classification performance. We evaluate the Replay, Learning without Forgetting (LwF), LwF, and Pseudo-Label strategies. LwF and Pseudo-Label exhibit optimal classification performance, but when including fairness metrics in the evaluation, it is clear that Pseudo-Label is less biased.
arXiv Detail & Related papers (2024-04-10T09:48:52Z)
Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge [66.86170104167608]
The RibFrac Challenge provides a benchmark dataset of over 5,000 rib fractures from 660 CT scans. During the MICCAI 2020 challenge period, 243 results were evaluated, and seven teams were invited to participate in the challenge summary. The analysis revealed that several top rib fracture detection solutions achieved performance comparable or even better than human experts.
arXiv Detail & Related papers (2024-02-14T18:18:33Z)
Revisiting Computer-Aided Tuberculosis Diagnosis [56.80999479735375]
Tuberculosis (TB) is a major global health threat, causing millions of deaths annually. Computer-aided tuberculosis diagnosis (CTD) using deep learning has shown promise, but progress is hindered by limited training data. We establish a large-scale dataset, namely the Tuberculosis X-ray (TBX11K) dataset, which contains 11,200 chest X-ray (CXR) images with corresponding bounding box annotations for TB areas. This dataset enables the training of sophisticated detectors for high-quality CTD.
arXiv Detail & Related papers (2023-07-06T08:27:48Z)
Towards Reliable Medical Image Segmentation by utilizing Evidential Calibrated Uncertainty [52.03490691733464]
We introduce DEviS, an easily implementable foundational model that seamlessly integrates into various medical image segmentation networks. By leveraging subjective logic theory, we explicitly model probability and uncertainty for the problem of medical image segmentation. DeviS incorporates an uncertainty-aware filtering module, which utilizes the metric of uncertainty-calibrated error to filter reliable data.
arXiv Detail & Related papers (2023-01-01T05:02:46Z)
StRegA: Unsupervised Anomaly Detection in Brain MRIs using a Compact Context-encoding Variational Autoencoder [48.2010192865749]
Unsupervised anomaly detection (UAD) can learn a data distribution from an unlabelled dataset of healthy subjects and then be applied to detect out of distribution samples. This research proposes a compact version of the "context-encoding" VAE (ceVAE) model, combined with pre and post-processing steps, creating a UAD pipeline (StRegA) The proposed pipeline achieved a Dice score of 0.642$pm$0.101 while detecting tumours in T2w images of the BraTS dataset and 0.859$pm$0.112 while detecting artificially induced anomalies.
arXiv Detail & Related papers (2022-01-31T14:27:35Z)
Deep learning-based detection of intravenous contrast in computed tomography scans [0.7313653675718069]
Identifying intravenous (IV) contrast use within CT scans is a key component of data curation for model development and testing. We developed and validated a CNN-based deep learning platform to identify IV contrast within CT scans.
arXiv Detail & Related papers (2021-10-16T00:46:45Z)
Learning to Automatically Diagnose Multiple Diseases in Pediatric Chest Radiographs Using Deep Convolutional Neural Networks [0.4697611383288171]
Deep convolutional neural networks (D-CNNs) have shown remarkable performance in interpreting chest radiograph (CXR) scans in adults. In this paper, we retrospectively collect a large dataset of 5,017 pediatric CXR scans, for which each is manually labeled by an experienced radiologist. A D-CNN model is then trained on 3,550 annotated scans to classify multiple pediatric lung pathologies automatically.
arXiv Detail & Related papers (2021-08-14T08:14:52Z)
Learning from Subjective Ratings Using Auto-Decoded Deep Latent Embeddings [23.777855250882244]
Managing subjectivity in labels is a fundamental problem in medical imaging analysis. We introduce auto-decoded deep latent embeddings (ADDLE) ADDLE explicitly models the tendencies of each rater using an auto-decoder framework.
arXiv Detail & Related papers (2021-04-12T15:40:42Z)
Critical Evaluation of Deep Neural Networks for Wrist Fracture Detection [1.0617212070722408]
Wrist Fracture is the most common type of fracture with a high incidence rate. Recent advances in the field of Deep Learning (DL) have shown that wrist fracture detection can be automated using Convolutional Neural Networks. Our results reveal that a typical state-of-the-art approach, such as DeepWrist, has a substantially lower performance on the challenging test set.
arXiv Detail & Related papers (2020-12-04T13:35:36Z)
Deep Sequential Learning for Cervical Spine Fracture Detection on Computed Tomography Imaging [20.051649556262216]
We propose a deep convolutional neural network (DCNN) with a bidirectional long-short term memory (BLSTM) layer for the automated detection of cervical spine fractures in CT axial images. We used an annotated dataset of 3,666 CT scans (729 positive and 2,937 negative cases) to train and validate the model. The validation results show a classification accuracy of 70.92% and 79.18% on the balanced (104 positive and 104 negative cases) and imbalanced (104 positive and 419 negative cases) test datasets, respectively.
arXiv Detail & Related papers (2020-10-26T04:36:29Z)
UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model. UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD) UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
Deep Mining External Imperfect Data for Chest X-ray Disease Screening [57.40329813850719]
We argue that incorporating an external CXR dataset leads to imperfect training data, which raises the challenges. We formulate the multi-label disease classification problem as weighted independent binary tasks according to the categories. Our framework simultaneously models and tackles the domain and label discrepancies, enabling superior knowledge mining ability.
arXiv Detail & Related papers (2020-06-06T06:48:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.