Problems and shortcuts in deep learning for screening mammography
- URL: http://arxiv.org/abs/2303.16417v1
- Date: Wed, 29 Mar 2023 02:50:59 GMT
- Title: Problems and shortcuts in deep learning for screening mammography
- Authors: Trevor Tsue, Brent Mombourquette, Ahmed Taha, Thomas Paul Matthews,
Yen Nhi Truong Vu, Jason Su
- Abstract summary: This work reveals undiscovered challenges in the performance and generalizability of deep learning models.
We trained an AI model to classify cancer on a retrospective dataset of 120,112 US exams (3,467 cancers) acquired from 2008 to 2017.
We evaluated on a screening mammography test set of 11,593 US exams (102 cancers; 7,594 women; age 57.1 pm 11.0) and 1,880 UK exams (590 cancers; 1,745 women; age 63.3 pm 7.2)
- Score: 2.9033848132822726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work reveals undiscovered challenges in the performance and
generalizability of deep learning models. We (1) identify spurious shortcuts
and evaluation issues that can inflate performance and (2) propose training and
analysis methods to address them.
We trained an AI model to classify cancer on a retrospective dataset of
120,112 US exams (3,467 cancers) acquired from 2008 to 2017 and 16,693 UK exams
(5,655 cancers) acquired from 2011 to 2015.
We evaluated on a screening mammography test set of 11,593 US exams (102
cancers; 7,594 women; age 57.1 \pm 11.0) and 1,880 UK exams (590 cancers; 1,745
women; age 63.3 \pm 7.2). A model trained on images of only view markers (no
breast) achieved a 0.691 AUC. The original model trained on both datasets
achieved a 0.945 AUC on the combined US+UK dataset but paradoxically only 0.838
and 0.892 on the US and UK datasets, respectively. Sampling cancers equally
from both datasets during training mitigated this shortcut. A similar AUC
paradox (0.903) occurred when evaluating diagnostic exams vs screening exams
(0.862 vs 0.861, respectively). Removing diagnostic exams during training
alleviated this bias. Finally, the model did not exhibit the AUC paradox over
scanner models but still exhibited a bias toward Selenia Dimension (SD) over
Hologic Selenia (HS) exams. Analysis showed that this AUC paradox occurred when
a dataset attribute had values with a higher cancer prevalence (dataset bias)
and the model consequently assigned a higher probability to these attribute
values (model bias). Stratification and balancing cancer prevalence can
mitigate shortcuts during evaluation.
Dataset and model bias can introduce shortcuts and the AUC paradox,
potentially pervasive issues within the healthcare AI space. Our methods can
verify and mitigate shortcuts while providing a clear understanding of
performance.
Related papers
- Incorporating Anatomical Awareness for Enhanced Generalizability and Progression Prediction in Deep Learning-Based Radiographic Sacroiliitis Detection [0.8248058061511542]
The aim of this study was to examine whether incorporating anatomical awareness into a deep learning model can improve generalizability and enable prediction of disease progression.
The performance of the models was compared using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity.
arXiv Detail & Related papers (2024-05-12T20:02:25Z) - AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets [0.33923727961771083]
Lung cancer's high mortality rate can be mitigated by early detection, increasingly reliant on AI for diagnostic imaging.
This study develops and validates AI models for both nodule detection and cancer classification tasks.
arXiv Detail & Related papers (2024-05-07T18:36:40Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - Detection of subclinical atherosclerosis by image-based deep learning on chest x-ray [86.38767955626179]
Deep-learning algorithm to predict coronary artery calcium (CAC) score was developed on 460 chest x-ray.
The diagnostic accuracy of the AICAC model assessed by the area under the curve (AUC) was the primary outcome.
arXiv Detail & Related papers (2024-03-27T16:56:14Z) - Performance of externally validated machine learning models based on
histopathology images for the diagnosis, classification, prognosis, or
treatment outcome prediction in female breast cancer: A systematic review [0.5792122879054292]
externally validated machine learning models for diagnosis, classification, prognosis, or treatment outcome prediction in female breast cancer.
Three studies externally validated ML models for diagnosis, 4 for classification, 2 for prognosis, and 1 for both classification and prognosis.
Most studies used Convolutional Neural Networks and one used logistic regression algorithms.
arXiv Detail & Related papers (2023-12-09T18:27:56Z) - Revisiting Computer-Aided Tuberculosis Diagnosis [56.80999479735375]
Tuberculosis (TB) is a major global health threat, causing millions of deaths annually.
Computer-aided tuberculosis diagnosis (CTD) using deep learning has shown promise, but progress is hindered by limited training data.
We establish a large-scale dataset, namely the Tuberculosis X-ray (TBX11K) dataset, which contains 11,200 chest X-ray (CXR) images with corresponding bounding box annotations for TB areas.
This dataset enables the training of sophisticated detectors for high-quality CTD.
arXiv Detail & Related papers (2023-07-06T08:27:48Z) - Building Brains: Subvolume Recombination for Data Augmentation in Large
Vessel Occlusion Detection [56.67577446132946]
A large training data set is required for a standard deep learning-based model to learn this strategy from data.
We propose an augmentation method that generates artificial training samples by recombining vessel tree segmentations of the hemispheres from different patients.
In line with the augmentation scheme, we use a 3D-DenseNet fed with task-specific input, fostering a side-by-side comparison between the hemispheres.
arXiv Detail & Related papers (2022-05-05T10:31:57Z) - Semi-supervised learning for generalizable intracranial hemorrhage
detection and segmentation [0.0]
We develop and evaluate a semisupervised learning model for intracranial hemorrhage detection and segmentation on an outofdistribution head CT evaluation set.
An initial "teacher" deep learning model was trained on 457 pixel-labeled head CT scans collected from one US institution from 2010-2017.
A second "student" model was trained on this combined pixel-labeled and pseudo-labeled dataset.
arXiv Detail & Related papers (2021-05-03T00:14:43Z) - Deep learning-based COVID-19 pneumonia classification using chest CT
images: model generalizability [54.86482395312936]
Deep learning (DL) classification models were trained to identify COVID-19-positive patients on 3D computed tomography (CT) datasets from different countries.
We trained nine identical DL-based classification models by using combinations of the datasets with a 72% train, 8% validation, and 20% test data split.
The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better.
arXiv Detail & Related papers (2021-02-18T21:14:52Z) - Deep Learning Applied to Chest X-Rays: Exploiting and Preventing
Shortcuts [11.511323714777298]
This paper studies the case of spurious class skew in which patients with a particular attribute are spuriously more likely to have the outcome of interest.
We show that deep nets can accurately identify many patient attributes including sex (AUROC = 0.96) and age (AUROC >= 0.90) when learning to predict a diagnosis.
A simple transfer learning approach is surprisingly effective at preventing the shortcut and promoting good performance.
arXiv Detail & Related papers (2020-09-21T18:52:43Z) - Automated Quantification of CT Patterns Associated with COVID-19 from
Chest CT [48.785596536318884]
The proposed method takes as input a non-contrasted chest CT and segments the lesions, lungs, and lobes in three dimensions.
The method outputs two combined measures of the severity of lung and lobe involvement, quantifying both the extent of COVID-19 abnormalities and presence of high opacities.
Evaluation of the algorithm is reported on CTs of 200 participants (100 COVID-19 confirmed patients and 100 healthy controls) from institutions from Canada, Europe and the United States.
arXiv Detail & Related papers (2020-04-02T21:49:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.