Failure Detection in Medical Image Classification: A Reality Check and
Benchmarking Testbed
- URL: http://arxiv.org/abs/2205.14094v1
- Date: Fri, 27 May 2022 16:50:48 GMT
- Title: Failure Detection in Medical Image Classification: A Reality Check and
Benchmarking Testbed
- Authors: Melanie Bernhardt, Fabio De Sousa Ribeiro, Ben Glocker
- Abstract summary: Failure detection in automated image classification is a critical safeguard for clinical deployment.
Despite its paramount importance, there is insufficient evidence about the ability of state-of-the-art confidence scoring methods to detect test-time failures.
This paper provides a reality check, establishing the performance of in-domain misclassification detection methods.
- Score: 23.25084022554028
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Failure detection in automated image classification is a critical safeguard
for clinical deployment. Detected failure cases can be referred to human
assessment, ensuring patient safety in computer-aided clinical decision making.
Despite its paramount importance, there is insufficient evidence about the
ability of state-of-the-art confidence scoring methods to detect test-time
failures of classification models in the context of medical imaging. This paper
provides a reality check, establishing the performance of in-domain
misclassification detection methods, benchmarking 9 confidence scores on 6
medical imaging datasets with different imaging modalities, in multiclass and
binary classification settings. Our experiments show that the problem of
failure detection is far from being solved. We found that none of the
benchmarked advanced methods proposed in the computer vision and machine
learning literature can consistently outperform a simple softmax baseline. Our
developed testbed facilitates future work in this important area.
Related papers
- Comparative Benchmarking of Failure Detection Methods in Medical Image Segmentation: Unveiling the Role of Confidence Aggregation [0.723226140060364]
This paper introduces a comprehensive benchmarking framework aimed at evaluating failure detection methodologies within medical image segmentation.
We identify the strengths and limitations of current failure detection metrics, advocating for the risk-coverage analysis as a holistic evaluation approach.
arXiv Detail & Related papers (2024-06-05T14:36:33Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - Robustness Stress Testing in Medical Image Classification [26.094688963784254]
We employ stress testing to assess model robustness and subgroup performance disparities in disease detection models.
We apply stress tests to measure the robustness of disease detection models for chest X-ray and skin lesion images.
Our experiments indicate that some models may yield more robust and equitable performance than others.
arXiv Detail & Related papers (2023-08-14T02:02:56Z) - Uncertainty-inspired Open Set Learning for Retinal Anomaly
Identification [71.06194656633447]
We establish an uncertainty-inspired open-set (UIOS) model, which was trained with fundus images of 9 retinal conditions.
Our UIOS model with thresholding strategy achieved an F1 score of 99.55%, 97.01% and 91.91% for the internal testing set.
UIOS correctly predicted high uncertainty scores, which would prompt the need for a manual check in the datasets of non-target categories retinal diseases, low-quality fundus images, and non-fundus images.
arXiv Detail & Related papers (2023-04-08T10:47:41Z) - Explainable Image Quality Assessment for Medical Imaging [0.0]
Poor-quality medical images may lead to misdiagnosis.
We propose an explainable image quality assessment system and validate our idea on two different objectives.
We apply a variety of techniques to measure the faithfulness of the saliency detectors.
We show that NormGrad has significant gains over other saliency detectors by reaching a repeated Pointing Game score of 0.853 for Object-CXR and 0.611 for LVOT datasets.
arXiv Detail & Related papers (2023-03-25T14:18:39Z) - Automated SSIM Regression for Detection and Quantification of Motion
Artefacts in Brain MR Images [54.739076152240024]
Motion artefacts in magnetic resonance brain images are a crucial issue.
The assessment of MR image quality is fundamental before proceeding with the clinical diagnosis.
An automated image quality assessment based on the structural similarity index (SSIM) regression has been proposed here.
arXiv Detail & Related papers (2022-06-14T10:16:54Z) - Improving Clinical Diagnosis Performance with Automated X-ray Scan
Quality Enhancement Algorithms [0.9137554315375919]
In clinical diagnosis, medical images may contain fault artifacts, introduced due to noise, blur and faulty equipment.
In this paper, automated image quality improvement approaches for adapted and benchmarked for the task of medical image super-resolution.
arXiv Detail & Related papers (2022-01-17T07:27:03Z) - Confidence-based Out-of-Distribution Detection: A Comparative Study and
Analysis [17.398553230843717]
We assess the capability of various state-of-the-art approaches for confidence-based OOD detection.
First, we leverage a computer vision benchmark to reproduce and compare multiple OOD detection methods.
We then evaluate their capabilities on the challenging task of disease classification using chest X-rays.
arXiv Detail & Related papers (2021-07-06T12:10:09Z) - Malignancy Prediction and Lesion Identification from Clinical
Dermatological Images [65.1629311281062]
We consider machine-learning-based malignancy prediction and lesion identification from clinical dermatological images.
We first identify all lesions present in the image regardless of sub-type or likelihood of malignancy, then it estimates their likelihood of malignancy, and through aggregation, it also generates an image-level likelihood of malignancy.
arXiv Detail & Related papers (2021-04-02T20:52:05Z) - Multi-label Thoracic Disease Image Classification with Cross-Attention
Networks [65.37531731899837]
We propose a novel scheme of Cross-Attention Networks (CAN) for automated thoracic disease classification from chest x-ray images.
We also design a new loss function that beyond cross-entropy loss to help cross-attention process and is able to overcome the imbalance between classes and easy-dominated samples within each class.
arXiv Detail & Related papers (2020-07-21T14:37:00Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.