Simple and Efficient Confidence Score for Grading Whole Slide Images
- URL: http://arxiv.org/abs/2303.04604v1
- Date: Wed, 8 Mar 2023 14:15:43 GMT
- Title: Simple and Efficient Confidence Score for Grading Whole Slide Images
- Authors: M\'elanie Lubrano, Ya\"elle Bellahsen-Harrar, Rutger Fick, C\'ecile
Badoual, Thomas Walter
- Abstract summary: We propose a new score to measure the confidence of AI models in grading tasks.
Our confidence score is specifically adapted to ordinal output variables, is versatile and does not require extra training or additional inferences.
We show that the score is capable of accurately identifying mispredicted slides and that accuracy for high confidence decisions is significantly higher than for low-confidence decisions.
- Score: 0.7349727826230862
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Grading precancerous lesions on whole slide images is a challenging task: the
continuous space of morphological phenotypes makes clear-cut decisions between
different grades often difficult, leading to low inter- and intra-rater
agreements. More and more Artificial Intelligence (AI) algorithms are developed
to help pathologists perform and standardize their diagnosis. However, those
models can render their prediction without consideration of the ambiguity of
the classes and can fail without notice which prevent their wider acceptance in
a clinical context. In this paper, we propose a new score to measure the
confidence of AI models in grading tasks. Our confidence score is specifically
adapted to ordinal output variables, is versatile and does not require extra
training or additional inferences nor particular architecture changes.
Comparison to other popular techniques such as Monte Carlo Dropout and deep
ensembles shows that our method provides state-of-the art results, while being
simpler, more versatile and less computationally intensive. The score is also
easily interpretable and consistent with real life hesitations of pathologists.
We show that the score is capable of accurately identifying mispredicted slides
and that accuracy for high confidence decisions is significantly higher than
for low-confidence decisions (gap in AUC of 17.1% on the test set). We believe
that the proposed confidence score could be leveraged by pathologists directly
in their workflow and assist them on difficult tasks such as grading
precancerous lesions.
Related papers
- Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning [49.28548464288051]
Composed Image Retrieval (CIR) enables image search by combining a reference image with modification text.<n>In intrinsic noise in CIR triplets incurs intrinsic uncertainty and threatens the model's robustness.<n>This paper introduces a Heterogeneous Uncertainty-Guided (HUG) paradigm to overcome these limitations.
arXiv Detail & Related papers (2026-01-16T16:05:49Z) - Label Uncertainty for Ultrasound Segmentation [25.682215047694168]
In medical imaging, inter-observer variability among radiologists often introduces label uncertainty.<n>We introduce a novel approach to both labeling and training AI models using expert-supplied, per-pixel confidence values.
arXiv Detail & Related papers (2025-08-21T15:00:21Z) - GRASP-PsONet: Gradient-based Removal of Spurious Patterns for PsOriasis Severity Classification [0.0]
We propose a framework to automatically flag problematic training images that introduce spurious correlations.<n>Removing 8.2% of flagged images improves model AUC-ROC by 5% (85% to 90%) on a held out test set.<n>When applied to a subset of training data rated by two dermatologists, the method identifies over 90% of cases with inter-rater disagreement.
arXiv Detail & Related papers (2025-06-27T03:42:09Z) - Detecting Discrepancies Between AI-Generated and Natural Images Using Uncertainty [91.64626435585643]
We propose a novel approach for detecting AI-generated images by leveraging predictive uncertainty to mitigate misuse and associated risks.
The motivation arises from the fundamental assumption regarding the distributional discrepancy between natural and AI-generated images.
We propose to leverage large-scale pre-trained models to calculate the uncertainty as the score for detecting AI-generated images.
arXiv Detail & Related papers (2024-12-08T11:32:25Z) - Probably Approximately Precision and Recall Learning [60.00180898830079]
A key challenge in machine learning is the prevalence of one-sided feedback.<n>We introduce a Probably Approximately Correct (PAC) framework in which hypotheses are set functions that map each input to a set of labels.<n>We develop new algorithms that learn from positive data alone, achieving optimal sample complexity in the realizable case.
arXiv Detail & Related papers (2024-11-20T04:21:07Z) - Improving Robustness and Reliability in Medical Image Classification with Latent-Guided Diffusion and Nested-Ensembles [4.249986624493547]
Ensemble deep learning has been shown to achieve high predictive accuracy and uncertainty estimation.
perturbations in the input images at test time can still lead to significant performance degradation.
LaDiNE is a novel and robust probabilistic method that is capable of inferring informative and invariant latent variables from the input images.
arXiv Detail & Related papers (2023-10-24T15:53:07Z) - Robust and Interpretable Medical Image Classifiers via Concept
Bottleneck Models [49.95603725998561]
We propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts.
Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model.
arXiv Detail & Related papers (2023-10-04T21:57:09Z) - Can ultrasound confidence maps predict sonographers' labeling
variability? [38.75943978900532]
This work proposes a novel approach that guides ultrasound segmentation networks to account for sonographers' uncertainties.
We show that there is a correlation between low values in the confidence maps and expert's label uncertainty.
Our results show ultrasound CMs increase the Dice score, improve the Hausdorff and Average Surface Distances, and decrease the number of isolated pixel predictions.
arXiv Detail & Related papers (2023-08-18T10:07:17Z) - Paced-Curriculum Distillation with Prediction and Label Uncertainty for
Image Segmentation [25.20877071896899]
In curriculum learning, the idea is to train on easier samples first and gradually increase the difficulty.
In self-paced learning, a pacing function defines the speed to adapt the training progress.
We develop a novel paced-curriculum distillation (PCD) for medical image segmentation.
arXiv Detail & Related papers (2023-02-02T12:24:14Z) - Benchmarking common uncertainty estimation methods with
histopathological images under domain shift and label noise [62.997667081978825]
In high-risk environments, deep learning models need to be able to judge their uncertainty and reject inputs when there is a significant chance of misclassification.
We conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole Slide Images.
We observe that ensembles of methods generally lead to better uncertainty estimates as well as an increased robustness towards domain shifts and label noise.
arXiv Detail & Related papers (2023-01-03T11:34:36Z) - PCA: Semi-supervised Segmentation with Patch Confidence Adversarial
Training [52.895952593202054]
We propose a new semi-supervised adversarial method called Patch Confidence Adrial Training (PCA) for medical image segmentation.
PCA learns the pixel structure and context information in each patch to get enough gradient feedback, which aids the discriminator in convergent to an optimal state.
Our method outperforms the state-of-the-art semi-supervised methods, which demonstrates its effectiveness for medical image segmentation.
arXiv Detail & Related papers (2022-07-24T07:45:47Z) - Towards Semi-Supervised Deep Facial Expression Recognition with An
Adaptive Confidence Margin [92.76372026435858]
We learn an Adaptive Confidence Margin (Ada-CM) to fully leverage all unlabeled data for semi-supervised deep facial expression recognition.
All unlabeled samples are partitioned into two subsets by comparing their confidence scores with the adaptively learned confidence margin.
Our method achieves state-of-the-art performance, especially surpassing fully-supervised baselines in a semi-supervised manner.
arXiv Detail & Related papers (2022-03-23T11:43:29Z) - Performance or Trust? Why Not Both. Deep AUC Maximization with
Self-Supervised Learning for COVID-19 Chest X-ray Classifications [72.52228843498193]
In training deep learning models, a compromise often must be made between performance and trust.
In this work, we integrate a new surrogate loss with self-supervised learning for computer-aided screening of COVID-19 patients.
arXiv Detail & Related papers (2021-12-14T21:16:52Z) - Using Soft Labels to Model Uncertainty in Medical Image Segmentation [0.0]
We propose a simple method to obtain soft labels from the annotations of multiple physicians.
For each image, our method produces a single well-calibrated output that can be thresholded at multiple confidence levels.
We evaluated our method on the MICCAI 2021 QUBIQ challenge, showing that it performs well across multiple medical image segmentation tasks.
arXiv Detail & Related papers (2021-09-26T14:47:18Z) - Estimating and Improving Fairness with Adversarial Learning [65.99330614802388]
We propose an adversarial multi-task training strategy to simultaneously mitigate and detect bias in the deep learning-based medical image analysis system.
Specifically, we propose to add a discrimination module against bias and a critical module that predicts unfairness within the base classification model.
We evaluate our framework on a large-scale public-available skin lesion dataset.
arXiv Detail & Related papers (2021-03-07T03:10:32Z) - Difficulty-aware Glaucoma Classification with Multi-Rater Consensus
Modeling [34.28252351672568]
We take advantage of the raw multi-rater gradings to improve the deep learning model performance for the glaucoma classification task.
A multi-branch model structure is proposed to predict the most sensitive, most specifical and a balanced fused result for the input images.
Compared with models trained only with the final ground-truth labels, the proposed method using multi-rater consensus information has achieved superior performance.
arXiv Detail & Related papers (2020-07-29T14:04:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.