ValUES: A Framework for Systematic Validation of Uncertainty Estimation in Semantic Segmentation
- URL: http://arxiv.org/abs/2401.08501v2
- Date: Fri, 3 May 2024 09:18:24 GMT
- Title: ValUES: A Framework for Systematic Validation of Uncertainty Estimation in Semantic Segmentation
- Authors: Kim-Celine Kahl, Carsten T. Lüth, Maximilian Zenk, Klaus Maier-Hein, Paul F. Jaeger,
- Abstract summary: Uncertainty estimation is an essential and heavily-studied component for semantic segmentation methods.
Can data-related and model-related uncertainty really be separated in practice?
Which components of an uncertainty method are essential for real-world performance?
- Score: 2.1517210693540005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Uncertainty estimation is an essential and heavily-studied component for the reliable application of semantic segmentation methods. While various studies exist claiming methodological advances on the one hand, and successful application on the other hand, the field is currently hampered by a gap between theory and practice leaving fundamental questions unanswered: Can data-related and model-related uncertainty really be separated in practice? Which components of an uncertainty method are essential for real-world performance? Which uncertainty method works well for which application? In this work, we link this research gap to a lack of systematic and comprehensive evaluation of uncertainty methods. Specifically, we identify three key pitfalls in current literature and present an evaluation framework that bridges the research gap by providing 1) a controlled environment for studying data ambiguities as well as distribution shifts, 2) systematic ablations of relevant method components, and 3) test-beds for the five predominant uncertainty applications: OoD-detection, active learning, failure detection, calibration, and ambiguity modeling. Empirical results on simulated as well as real-world data demonstrate how the proposed framework is able to answer the predominant questions in the field revealing for instance that 1) separation of uncertainty types works on simulated data but does not necessarily translate to real-world data, 2) aggregation of scores is a crucial but currently neglected component of uncertainty methods, 3) While ensembles are performing most robustly across the different downstream tasks and settings, test-time augmentation often constitutes a light-weight alternative. Code is at: https://github.com/IML-DKFZ/values
Related papers
- Perception Matters: Enhancing Embodied AI with Uncertainty-Aware Semantic Segmentation [24.32551050538683]
Embodied AI has made significant progress acting in unexplored environments.
We focus on dated perception models, neglect temporal aggregation, and transfer from ground truth directly to noisy perception at test time.
We address the identified problems through calibrated perception probabilities and uncertainty across aggregation and found decisions.
arXiv Detail & Related papers (2024-08-05T08:14:28Z) - Uncertainty for Active Learning on Graphs [70.44714133412592]
Uncertainty Sampling is an Active Learning strategy that aims to improve the data efficiency of machine learning models.
We benchmark Uncertainty Sampling beyond predictive uncertainty and highlight a significant performance gap to other Active Learning strategies.
We develop ground-truth Bayesian uncertainty estimates in terms of the data generating process and prove their effectiveness in guiding Uncertainty Sampling toward optimal queries.
arXiv Detail & Related papers (2024-05-02T16:50:47Z) - Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability.
In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling.
Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z) - How Reliable is Your Regression Model's Uncertainty Under Real-World
Distribution Shifts? [46.05502630457458]
We propose a benchmark of 8 image-based regression datasets with different types of challenging distribution shifts.
We find that while methods are well calibrated when there is no distribution shift, they all become highly overconfident on many of the benchmark datasets.
arXiv Detail & Related papers (2023-02-07T18:54:39Z) - A Call to Reflect on Evaluation Practices for Failure Detection in Image
Classification [0.491574468325115]
We present a large-scale empirical study for the first time enabling benchmarking confidence scoring functions.
The revelation of a simple softmax response baseline as the overall best performing method underlines the drastic shortcomings of current evaluation.
arXiv Detail & Related papers (2022-11-28T12:25:27Z) - Composed Image Retrieval with Text Feedback via Multi-grained
Uncertainty Regularization [73.04187954213471]
We introduce a unified learning approach to simultaneously modeling the coarse- and fine-grained retrieval.
The proposed method has achieved +4.03%, +3.38%, and +2.40% Recall@50 accuracy over a strong baseline.
arXiv Detail & Related papers (2022-11-14T14:25:40Z) - Reliability-Aware Prediction via Uncertainty Learning for Person Image
Retrieval [51.83967175585896]
UAL aims at providing reliability-aware predictions by considering data uncertainty and model uncertainty simultaneously.
Data uncertainty captures the noise" inherent in the sample, while model uncertainty depicts the model's confidence in the sample's prediction.
arXiv Detail & Related papers (2022-10-24T17:53:20Z) - Identifying Causal-Effect Inference Failure with Uncertainty-Aware
Models [41.53326337725239]
We introduce a practical approach for integrating uncertainty estimation into a class of state-of-the-art neural network methods.
We show that our methods enable us to deal gracefully with situations of "no-overlap", common in high-dimensional data.
We show that correctly modeling uncertainty can keep us from giving overconfident and potentially harmful recommendations.
arXiv Detail & Related papers (2020-07-01T00:37:41Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep
Learning [70.72363097550483]
In this study, we focus on in-domain uncertainty for image classification.
To provide more insight in this study, we introduce the deep ensemble equivalent score (DEE)
arXiv Detail & Related papers (2020-02-15T23:28:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.