A Comparative Study of Confidence Calibration in Deep Learning: From
Computer Vision to Medical Imaging
- URL: http://arxiv.org/abs/2206.08833v1
- Date: Fri, 17 Jun 2022 15:27:24 GMT
- Title: A Comparative Study of Confidence Calibration in Deep Learning: From
Computer Vision to Medical Imaging
- Authors: Riqiang Gao, Thomas Li, Yucheng Tang, Zhoubing Xu, Michael Kammer,
Sanja L. Antic, Kim Sandler, Fabien Moldonado, Thomas A. Lasko, Bennett
Landman
- Abstract summary: Deep learning prediction models can often suffer from poor calibration across challenging domains including healthcare.
We bridge the confidence calibration from computer vision to medical imaging with a comparative study of four high-impact calibration models.
- Score: 3.7292013802839152
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although deep learning prediction models have been successful in the
discrimination of different classes, they can often suffer from poor
calibration across challenging domains including healthcare. Moreover, the
long-tail distribution poses great challenges in deep learning classification
problems including clinical disease prediction. There are approaches proposed
recently to calibrate deep prediction in computer vision, but there are no
studies found to demonstrate how the representative models work in different
challenging contexts. In this paper, we bridge the confidence calibration from
computer vision to medical imaging with a comparative study of four high-impact
calibration models. Our studies are conducted in different contexts (natural
image classification and lung cancer risk estimation) including in balanced vs.
imbalanced training sets and in computer vision vs. medical imaging. Our
results support key findings: (1) We achieve new conclusions which are not
studied under different learning contexts, e.g., combining two calibration
models that both mitigate the overconfident prediction can lead to
under-confident prediction, and simpler calibration models from the computer
vision domain tend to be more generalizable to medical imaging. (2) We
highlight the gap between general computer vision tasks and medical imaging
prediction, e.g., calibration methods ideal for general computer vision tasks
may in fact damage the calibration of medical imaging prediction. (3) We also
reinforce previous conclusions in natural image classification settings. We
believe that this study has merits to guide readers to choose calibration
models and understand gaps between general computer vision and medical imaging
domains.
Related papers
- PRECISe : Prototype-Reservation for Explainable Classification under Imbalanced and Scarce-Data Settings [0.0]
PRECISe is an explainable-by-design model meticulously constructed to address all three challenges.
PreCISe outperforms the current state-of-the-art methods on data efficient generalization to minority classes.
Case study is presented to highlight the model's ability to produce easily interpretable predictions.
arXiv Detail & Related papers (2024-08-11T12:05:32Z) - Robust and Interpretable Medical Image Classifiers via Concept
Bottleneck Models [49.95603725998561]
We propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts.
Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model.
arXiv Detail & Related papers (2023-10-04T21:57:09Z) - Understanding Calibration of Deep Neural Networks for Medical Image
Classification [3.461503547789351]
This study explores model performance and calibration under different training regimes.
We consider fully supervised training, as well as rotation-based self-supervised method with and without transfer learning.
Our study reveals that factors such as weight distributions and the similarity of learned representations correlate with the calibration trends observed in the models.
arXiv Detail & Related papers (2023-09-22T18:36:07Z) - Calibration of Neural Networks [77.34726150561087]
This paper presents a survey of confidence calibration problems in the context of neural networks.
We analyze problem statement, calibration definitions, and different approaches to evaluation.
Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.
arXiv Detail & Related papers (2023-03-19T20:27:51Z) - A Trustworthy Framework for Medical Image Analysis with Deep Learning [71.48204494889505]
TRUDLMIA is a trustworthy deep learning framework for medical image analysis.
It is anticipated that the framework will support researchers and clinicians in advancing the use of deep learning for dealing with public health crises including COVID-19.
arXiv Detail & Related papers (2022-12-06T05:30:22Z) - Learning Discriminative Representation via Metric Learning for
Imbalanced Medical Image Classification [52.94051907952536]
We propose embedding metric learning into the first stage of the two-stage framework specially to help the feature extractor learn to extract more discriminative feature representations.
Experiments mainly on three medical image datasets show that the proposed approach consistently outperforms existing onestage and two-stage approaches.
arXiv Detail & Related papers (2022-07-14T14:57:01Z) - Interpretable Mammographic Image Classification using Cased-Based
Reasoning and Deep Learning [20.665935997959025]
We present a novel interpretable neural network algorithm that uses case-based reasoning for mammography.
Our network presents both a prediction of malignancy and an explanation of that prediction using known medical features.
arXiv Detail & Related papers (2021-07-12T17:42:09Z) - On the Robustness of Pretraining and Self-Supervision for a Deep
Learning-based Analysis of Diabetic Retinopathy [70.71457102672545]
We compare the impact of different training procedures for diabetic retinopathy grading.
We investigate different aspects such as quantitative performance, statistics of the learned feature representations, interpretability and robustness to image distortions.
Our results indicate that models from ImageNet pretraining report a significant increase in performance, generalization and robustness to image distortions.
arXiv Detail & Related papers (2021-06-25T08:32:45Z) - Malignancy Prediction and Lesion Identification from Clinical
Dermatological Images [65.1629311281062]
We consider machine-learning-based malignancy prediction and lesion identification from clinical dermatological images.
We first identify all lesions present in the image regardless of sub-type or likelihood of malignancy, then it estimates their likelihood of malignancy, and through aggregation, it also generates an image-level likelihood of malignancy.
arXiv Detail & Related papers (2021-04-02T20:52:05Z) - Advancing diagnostic performance and clinical usability of neural
networks via adversarial training and dual batch normalization [2.1699022621790736]
We let six radiologists rate the interpretability of saliency maps in datasets of X-rays, computed tomography, and magnetic resonance imaging scans.
We found that the accuracy of adversarially trained models was equal to standard models when sufficiently large datasets and dual batch norm training were used.
arXiv Detail & Related papers (2020-11-25T20:41:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.