Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
- URL: http://arxiv.org/abs/2407.01942v1
- Date: Tue, 2 Jul 2024 04:23:54 GMT
- Title: Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
- Authors: Khyathi Raghavi Chandu, Linjie Li, Anas Awadalla, Ximing Lu, Jae Sung Park, Jack Hessel, Lijuan Wang, Yejin Choi,
- Abstract summary: We present a taxonomy of uncertainty specific to vision-language AI systems.
We also introduce a new metric confidence-weighted accuracy, that is well correlated with both accuracy and calibration error.
- Score: 106.52630978891054
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable. In this paper, we present a taxonomy of uncertainty specific to vision-language AI systems, distinguishing between epistemic uncertainty (arising from a lack of information) and aleatoric uncertainty (due to inherent unpredictability), and further explore finer categories within. Based on this taxonomy, we synthesize a benchmark dataset, CertainlyUncertain, featuring 178K visual question answering (VQA) samples as contrastive pairs. This is achieved by 1) inpainting images to make previously answerable questions into unanswerable ones; and 2) using image captions to prompt large language models for both answerable and unanswerable questions. Additionally, we introduce a new metric confidence-weighted accuracy, that is well correlated with both accuracy and calibration error, to address the shortcomings of existing metrics.
Related papers
- Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning
for Video Question Answering [63.12469700986452]
We introduce the concept of uncertainty-aware curriculum learning (CL)
Here, uncertainty serves as the guiding principle for dynamically adjusting the difficulty.
In practice, we seamlessly integrate the VideoQA model into our framework and conduct comprehensive experiments.
arXiv Detail & Related papers (2024-01-03T02:29:34Z) - Identifying Drivers of Predictive Aleatoric Uncertainty [2.5311562666866494]
We present a simple approach to explain predictive aleatoric uncertainties.
We estimate uncertainty as predictive variance by adapting a neural network with a Gaussian output distribution.
We quantify our findings with a nuanced benchmark analysis that includes real-world datasets.
arXiv Detail & Related papers (2023-12-12T13:28:53Z) - Uncertainty-aware Language Modeling for Selective Question Answering [107.47864420630923]
We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs.
Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems.
arXiv Detail & Related papers (2023-11-26T22:47:54Z) - Improving the Reliability of Large Language Models by Leveraging
Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination"
We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z) - Unified Uncertainty Calibration [43.733911707842005]
We introduce emphunified uncertainty calibration (U2C), a holistic framework to combine aleatoric and uncertainty uncertainties.
U2C enables a clean learning-theoretical analysis of uncertainty estimation, and outperforms reject-or-classify across a variety of ImageNet benchmarks.
arXiv Detail & Related papers (2023-10-02T13:42:36Z) - Flexible Visual Recognition by Evidential Modeling of Confusion and
Ignorance [25.675733490127964]
In real-world scenarios, typical visual recognition systems could fail under two major causes, i.e., the misclassification between known classes and the excusable misbehavior on unknown-class images.
To tackle these deficiencies, flexible visual recognition should dynamically predict multiple classes when they are unconfident between choices and reject making predictions when the input is entirely out of the training distribution.
In this paper, we propose to model these two sources of uncertainty explicitly with the theory of Subjective Logic.
arXiv Detail & Related papers (2023-09-14T03:16:05Z) - Calibrating Ensembles for Scalable Uncertainty Quantification in Deep
Learning-based Medical Segmentation [0.42008820076301906]
Uncertainty quantification in automated image analysis is highly desired in many applications.
Current uncertainty quantification approaches do not scale well in high-dimensional real-world problems.
We propose a scalable and intuitive framework to calibrate ensembles of deep learning models to produce uncertainty quantification measurements.
arXiv Detail & Related papers (2022-09-20T09:09:48Z) - Approaching Neural Network Uncertainty Realism [53.308409014122816]
Quantifying or at least upper-bounding uncertainties is vital for safety-critical systems such as autonomous vehicles.
We evaluate uncertainty realism -- a strict quality criterion -- with a Mahalanobis distance-based statistical test.
We adopt it to the automotive domain and show that it significantly improves uncertainty realism compared to a plain encoder-decoder model.
arXiv Detail & Related papers (2021-01-08T11:56:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.