Explaining Model Confidence Using Counterfactuals
- URL: http://arxiv.org/abs/2303.05729v1
- Date: Fri, 10 Mar 2023 06:22:13 GMT
- Title: Explaining Model Confidence Using Counterfactuals
- Authors: Thao Le, Tim Miller, Ronal Singh and Liz Sonenberg
- Abstract summary: Displaying confidence scores in human-AI interaction has been shown to help build trust between humans and AI systems.
Most existing research uses only the confidence score as a form of communication.
We show that counterfactual explanations of confidence scores help study participants to better understand and better trust a machine learning model's prediction.
- Score: 4.385390451313721
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Displaying confidence scores in human-AI interaction has been shown to help
build trust between humans and AI systems. However, most existing research uses
only the confidence score as a form of communication. As confidence scores are
just another model output, users may want to understand why the algorithm is
confident to determine whether to accept the confidence score. In this paper,
we show that counterfactual explanations of confidence scores help study
participants to better understand and better trust a machine learning model's
prediction. We present two methods for understanding model confidence using
counterfactual explanation: (1) based on counterfactual examples; and (2) based
on visualisation of the counterfactual space. Both increase understanding and
trust for study participants over a baseline of no explanation, but qualitative
results show that they are used quite differently, leading to recommendations
of when to use each one and directions of designing better explanations.
Related papers
- Fostering Trust and Quantifying Value of AI and ML [0.0]
Much has been discussed about trusting AI and ML inferences, but little has been done to define what that means.
producing ever more trustworthy machine learning inferences is a path to increase the value of products.
arXiv Detail & Related papers (2024-07-08T13:25:28Z) - Automated Trustworthiness Testing for Machine Learning Classifiers [3.3423762257383207]
This paper proposes TOWER, the first technique to automatically create trustworthiness oracles that determine whether text classifier predictions are trustworthy.
Our hypothesis is that a prediction is trustworthy if the words in its explanation are semantically related to the predicted class.
The results show that TOWER can detect a decrease in trustworthiness as noise increases, but is not effective when evaluated against the human-labeled dataset.
arXiv Detail & Related papers (2024-06-07T20:25:05Z) - A Diachronic Perspective on User Trust in AI under Uncertainty [52.44939679369428]
Modern NLP systems are often uncalibrated, resulting in confidently incorrect predictions that undermine user trust.
We study the evolution of user trust in response to trust-eroding events using a betting game.
arXiv Detail & Related papers (2023-10-20T14:41:46Z) - Trust, but Verify: Using Self-Supervised Probing to Improve
Trustworthiness [29.320691367586004]
We introduce a new approach of self-supervised probing, which enables us to check and mitigate the overconfidence issue for a trained model.
We provide a simple yet effective framework, which can be flexibly applied to existing trustworthiness-related methods in a plug-and-play manner.
arXiv Detail & Related papers (2023-02-06T08:57:20Z) - Improving the Reliability for Confidence Estimation [16.952133489480776]
Confidence estimation is a task that aims to evaluate the trustworthiness of the model's prediction output during deployment.
Previous works have outlined two important qualities that a reliable confidence estimation model should possess.
We propose a meta-learning framework that can simultaneously improve upon both qualities in a confidence estimation model.
arXiv Detail & Related papers (2022-10-13T06:34:23Z) - UKP-SQuARE v2 Explainability and Adversarial Attacks for Trustworthy QA [47.8796570442486]
Question Answering systems are increasingly deployed in applications where they support real-world decisions.
Inherently interpretable models or post hoc explainability methods can help users to comprehend how a model arrives at its prediction.
We introduce SQuARE v2, the new version of SQuARE, to provide an explainability infrastructure for comparing models.
arXiv Detail & Related papers (2022-08-19T13:01:01Z) - Improving Model Understanding and Trust with Counterfactual Explanations
of Model Confidence [4.385390451313721]
Showing confidence scores in human-agent interaction systems can help build trust between humans and AI systems.
Most existing research only used the confidence score as a form of communication.
This paper presents two methods for understanding model confidence using counterfactual explanation.
arXiv Detail & Related papers (2022-06-06T04:04:28Z) - An evaluation of word-level confidence estimation for end-to-end
automatic speech recognition [70.61280174637913]
We investigate confidence estimation for end-to-end automatic speech recognition (ASR)
We provide an extensive benchmark of popular confidence methods on four well-known speech datasets.
Our results suggest a strong baseline can be obtained by scaling the logits by a learnt temperature.
arXiv Detail & Related papers (2021-01-14T09:51:59Z) - Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and
Goals of Human Trust in AI [55.4046755826066]
We discuss a model of trust inspired by, but not identical to, sociology's interpersonal trust (i.e., trust between people)
We incorporate a formalization of 'contractual trust', such that trust between a user and an AI is trust that some implicit or explicit contract will hold.
We discuss how to design trustworthy AI, how to evaluate whether trust has manifested, and whether it is warranted.
arXiv Detail & Related papers (2020-10-15T03:07:23Z) - How Much Can We Really Trust You? Towards Simple, Interpretable Trust
Quantification Metrics for Deep Neural Networks [94.65749466106664]
We conduct a thought experiment and explore two key questions about trust in relation to confidence.
We introduce a suite of metrics for assessing the overall trustworthiness of deep neural networks based on their behaviour when answering a set of questions.
The proposed metrics are by no means perfect, but the hope is to push the conversation towards better metrics.
arXiv Detail & Related papers (2020-09-12T17:37:36Z) - Binary Classification from Positive Data with Skewed Confidence [85.18941440826309]
Positive-confidence (Pconf) classification is a promising weakly-supervised learning method.
In practice, the confidence may be skewed by bias arising in an annotation process.
We introduce the parameterized model of the skewed confidence, and propose the method for selecting the hyper parameter.
arXiv Detail & Related papers (2020-01-29T00:04:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.