Understanding Calibration of Deep Neural Networks for Medical Image
Classification
- URL: http://arxiv.org/abs/2309.13132v2
- Date: Sat, 2 Dec 2023 09:07:08 GMT
- Title: Understanding Calibration of Deep Neural Networks for Medical Image
Classification
- Authors: Abhishek Singh Sambyal, Usma Niyaz, Narayanan C. Krishnan, Deepti R.
Bathula
- Abstract summary: This study explores model performance and calibration under different training regimes.
We consider fully supervised training, as well as rotation-based self-supervised method with and without transfer learning.
Our study reveals that factors such as weight distributions and the similarity of learned representations correlate with the calibration trends observed in the models.
- Score: 3.461503547789351
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the field of medical image analysis, achieving high accuracy is not
enough; ensuring well-calibrated predictions is also crucial. Confidence scores
of a deep neural network play a pivotal role in explainability by providing
insights into the model's certainty, identifying cases that require attention,
and establishing trust in its predictions. Consequently, the significance of a
well-calibrated model becomes paramount in the medical imaging domain, where
accurate and reliable predictions are of utmost importance. While there has
been a significant effort towards training modern deep neural networks to
achieve high accuracy on medical imaging tasks, model calibration and factors
that affect it remain under-explored. To address this, we conducted a
comprehensive empirical study that explores model performance and calibration
under different training regimes. We considered fully supervised training,
which is the prevailing approach in the community, as well as rotation-based
self-supervised method with and without transfer learning, across various
datasets and architecture sizes. Multiple calibration metrics were employed to
gain a holistic understanding of model calibration. Our study reveals that
factors such as weight distributions and the similarity of learned
representations correlate with the calibration trends observed in the models.
Notably, models trained using rotation-based self-supervised pretrained regime
exhibit significantly better calibration while achieving comparable or even
superior performance compared to fully supervised models across different
medical imaging datasets. These findings shed light on the importance of model
calibration in medical image analysis and highlight the benefits of
incorporating self-supervised learning approach to improve both performance and
calibration.
Related papers
- On the Calibration of Large Language Models and Alignment [63.605099174744865]
Confidence calibration serves as a crucial tool for gauging the reliability of deep models.
We conduct a systematic examination of the calibration of aligned language models throughout the entire construction process.
Our work sheds light on whether popular LLMs are well-calibrated and how the training process influences model calibration.
arXiv Detail & Related papers (2023-11-22T08:57:55Z) - Improving Robustness and Reliability in Medical Image Classification with Latent-Guided Diffusion and Nested-Ensembles [4.249986624493547]
Ensemble deep learning has been shown to achieve high predictive accuracy and uncertainty estimation.
perturbations in the input images at test time can still lead to significant performance degradation.
LaDiNE is a novel and robust probabilistic method that is capable of inferring informative and invariant latent variables from the input images.
arXiv Detail & Related papers (2023-10-24T15:53:07Z) - Calibration of Neural Networks [77.34726150561087]
This paper presents a survey of confidence calibration problems in the context of neural networks.
We analyze problem statement, calibration definitions, and different approaches to evaluation.
Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.
arXiv Detail & Related papers (2023-03-19T20:27:51Z) - NCTV: Neural Clamping Toolkit and Visualization for Neural Network
Calibration [66.22668336495175]
A lack of consideration for neural network calibration will not gain trust from humans.
We introduce the Neural Clamping Toolkit, the first open-source framework designed to help developers employ state-of-the-art model-agnostic calibrated models.
arXiv Detail & Related papers (2022-11-29T15:03:05Z) - A Comparative Study of Confidence Calibration in Deep Learning: From
Computer Vision to Medical Imaging [3.7292013802839152]
Deep learning prediction models can often suffer from poor calibration across challenging domains including healthcare.
We bridge the confidence calibration from computer vision to medical imaging with a comparative study of four high-impact calibration models.
arXiv Detail & Related papers (2022-06-17T15:27:24Z) - Performance or Trust? Why Not Both. Deep AUC Maximization with
Self-Supervised Learning for COVID-19 Chest X-ray Classifications [72.52228843498193]
In training deep learning models, a compromise often must be made between performance and trust.
In this work, we integrate a new surrogate loss with self-supervised learning for computer-aided screening of COVID-19 patients.
arXiv Detail & Related papers (2021-12-14T21:16:52Z) - On the Robustness of Pretraining and Self-Supervision for a Deep
Learning-based Analysis of Diabetic Retinopathy [70.71457102672545]
We compare the impact of different training procedures for diabetic retinopathy grading.
We investigate different aspects such as quantitative performance, statistics of the learned feature representations, interpretability and robustness to image distortions.
Our results indicate that models from ImageNet pretraining report a significant increase in performance, generalization and robustness to image distortions.
arXiv Detail & Related papers (2021-06-25T08:32:45Z) - Advancing diagnostic performance and clinical usability of neural
networks via adversarial training and dual batch normalization [2.1699022621790736]
We let six radiologists rate the interpretability of saliency maps in datasets of X-rays, computed tomography, and magnetic resonance imaging scans.
We found that the accuracy of adversarially trained models was equal to standard models when sufficiently large datasets and dual batch norm training were used.
arXiv Detail & Related papers (2020-11-25T20:41:01Z) - Self-Training with Improved Regularization for Sample-Efficient Chest
X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios.
Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.