Related papers: A Survey on Uncertainty Toolkits for Deep Learning

A Survey on Uncertainty Toolkits for Deep Learning

URL: http://arxiv.org/abs/2205.01040v1
Date: Mon, 2 May 2022 17:23:06 GMT
Title: A Survey on Uncertainty Toolkits for Deep Learning
Authors: Maximilian Pintz, Joachim Sicking, Maximilian Poretschkin, Maram Akila
Abstract summary: We present the first survey on toolkits for uncertainty estimation in deep learning (DL) We investigate 11 toolkits with respect to modeling and evaluation capabilities. While the first two provide a large degree of flexibility and seamless integration into their respective framework, the last one has the larger methodological scope.
Score: 3.113304966059062
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The success of deep learning (DL) fostered the creation of unifying frameworks such as tensorflow or pytorch as much as it was driven by their creation in return. Having common building blocks facilitates the exchange of, e.g., models or concepts and makes developments easier replicable. Nonetheless, robust and reliable evaluation and assessment of DL models has often proven challenging. This is at odds with their increasing safety relevance, which recently culminated in the field of "trustworthy ML". We believe that, among others, further unification of evaluation and safeguarding methodologies in terms of toolkits, i.e., small and specialized framework derivatives, might positively impact problems of trustworthiness as well as reproducibility. To this end, we present the first survey on toolkits for uncertainty estimation (UE) in DL, as UE forms a cornerstone in assessing model reliability. We investigate 11 toolkits with respect to modeling and evaluation capabilities, providing an in-depth comparison for the three most promising ones, namely Pyro, Tensorflow Probability, and Uncertainty Quantification 360. While the first two provide a large degree of flexibility and seamless integration into their respective framework, the last one has the larger methodological scope.

Related papers

Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling [48.15636223774418]
Large language models (LLMs) frequently hallucinate due to misaligned self-awareness. Existing approaches mitigate hallucinations via uncertainty estimation or query rejection. We propose the Explicit Knowledge Boundary Modeling framework to integrate fast and slow reasoning systems.
arXiv Detail & Related papers (2025-03-04T03:16:02Z)
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective [333.9220561243189]
Generative Foundation Models (GenFMs) have emerged as transformative tools. Their widespread adoption raises critical concerns regarding trustworthiness across dimensions. This paper presents a comprehensive framework to address these challenges through three key contributions.
arXiv Detail & Related papers (2025-02-20T06:20:36Z)
A Critical Synthesis of Uncertainty Quantification and Foundation Models in Monocular Depth Estimation [13.062551984263031]
Metric depth estimation, which involves predicting absolute distances, poses particular challenges. We fuse five different uncertainty quantification methods with the current state-of-the-art DepthAnythingV2 foundation model. Our findings identify fine-tuning with the Gaussian Negative Log-Likelihood Loss (GNLL) as a particularly promising approach.
arXiv Detail & Related papers (2025-01-14T15:13:00Z)
Rethinking the Uncertainty: A Critical Review and Analysis in the Era of Large Language Models [42.563558441750224]
Large Language Models (LLMs) have become fundamental to a broad spectrum of artificial intelligence applications. Current methods often struggle to accurately identify, measure, and address the true uncertainty. This paper introduces a comprehensive framework specifically designed to identify and understand the types and sources of uncertainty.
arXiv Detail & Related papers (2024-10-26T15:07:15Z)
Challenges and Considerations in the Evaluation of Bayesian Causal Discovery [49.0053848090947]
Representing uncertainty in causal discovery is a crucial component for experimental design, and more broadly, for safe and reliable causal decision making. Unlike non-Bayesian causal discovery, which relies on a single estimated causal graph and model parameters for assessment, causal discovery presents challenges due to the nature of its quantity. No consensus on the most suitable metric for evaluation.
arXiv Detail & Related papers (2024-06-05T12:45:23Z)
Large Language Model Confidence Estimation via Black-Box Access [30.490207799344333]
We propose a simple framework where, we engineer novel features and train a (interpretable) model to estimate the confidence. We empirically demonstrate that our framework is effective in estimating confidence of Flan-ul2,-13b and Mistral-7b on four benchmark Q&A tasks. Our interpretable approach provides insight into features that are predictive of confidence, leading to the interesting and useful discovery.
arXiv Detail & Related papers (2024-06-01T02:08:44Z)
Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models [14.5291643644017]
We introduce the concept of Confidence-Probability Alignment. We probe the alignment between models' internal and expressed confidence. Among the models analyzed, OpenAI's GPT-4 showed the strongest confidence-probability alignment.
arXiv Detail & Related papers (2024-05-25T15:42:04Z)
Towards Precise Observations of Neural Model Robustness in Classification [2.127049691404299]
In deep learning applications, robustness measures the ability of neural models that handle slight changes in input data. Our approach contributes to a deeper understanding of model robustness in safety-critical applications.
arXiv Detail & Related papers (2024-04-25T09:37:44Z)
Improving the Reliability of Large Language Models by Leveraging Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination" We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z)
Measuring and Modeling Uncertainty Degree for Monocular Depth Estimation [50.920911532133154]
The intrinsic ill-posedness and ordinal-sensitive nature of monocular depth estimation (MDE) models pose major challenges to the estimation of uncertainty degree. We propose to model the uncertainty of MDE models from the perspective of the inherent probability distributions. By simply introducing additional training regularization terms, our model, with surprisingly simple formations and without requiring extra modules or multiple inferences, can provide uncertainty estimations with state-of-the-art reliability.
arXiv Detail & Related papers (2023-07-19T12:11:15Z)
Calibrating Multimodal Learning [94.65232214643436]
We propose a novel regularization technique, i.e., Calibrating Multimodal Learning (CML) regularization, to calibrate the predictive confidence of previous methods. This technique could be flexibly equipped by existing models and improve the performance in terms of confidence calibration, classification accuracy, and model robustness.
arXiv Detail & Related papers (2023-06-02T04:29:57Z)
Toward Reliable Human Pose Forecasting with Uncertainty [51.628234388046195]
We develop an open-source library for human pose forecasting, including multiple models, supporting several datasets. We devise two types of uncertainty in the problem to increase performance and convey better trust.
arXiv Detail & Related papers (2023-04-13T17:56:08Z)
Assessing the Reliability of Deep Learning Classifiers Through Robustness Evaluation and Operational Profiles [13.31639740011618]
We present a model-agnostic reliability assessment method for Deep Learning (DL) classifiers. We partition the input space into small cells and then "assemble" their robustness (to the ground truth) according to the operational profile (OP) of a given application. Reliability estimates in terms of the probability of misclassification per input (pmi) can be derived together with confidence levels.
arXiv Detail & Related papers (2021-06-02T16:10:46Z)
Trust but Verify: Assigning Prediction Credibility by Counterfactual Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning. These measures should account for the wide variety of models used in practice. The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.