Improving Model Understanding and Trust with Counterfactual Explanations
of Model Confidence
- URL: http://arxiv.org/abs/2206.02790v1
- Date: Mon, 6 Jun 2022 04:04:28 GMT
- Title: Improving Model Understanding and Trust with Counterfactual Explanations
of Model Confidence
- Authors: Thao Le, Tim Miller, Ronal Singh and Liz Sonenberg
- Abstract summary: Showing confidence scores in human-agent interaction systems can help build trust between humans and AI systems.
Most existing research only used the confidence score as a form of communication.
This paper presents two methods for understanding model confidence using counterfactual explanation.
- Score: 4.385390451313721
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we show that counterfactual explanations of confidence scores
help users better understand and better trust an AI model's prediction in
human-subject studies. Showing confidence scores in human-agent interaction
systems can help build trust between humans and AI systems. However, most
existing research only used the confidence score as a form of communication,
and we still lack ways to explain why the algorithm is confident. This paper
also presents two methods for understanding model confidence using
counterfactual explanation: (1) based on counterfactual examples; and (2) based
on visualisation of the counterfactual space.
Related papers
- Automated Trustworthiness Testing for Machine Learning Classifiers [3.3423762257383207]
This paper proposes TOWER, the first technique to automatically create trustworthiness oracles that determine whether text classifier predictions are trustworthy.
Our hypothesis is that a prediction is trustworthy if the words in its explanation are semantically related to the predicted class.
The results show that TOWER can detect a decrease in trustworthiness as noise increases, but is not effective when evaluated against the human-labeled dataset.
arXiv Detail & Related papers (2024-06-07T20:25:05Z) - Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models [14.5291643644017]
We introduce the concept of Confidence-Probability Alignment.
We probe the alignment between models' internal and expressed confidence.
Among the models analyzed, OpenAI's GPT-4 showed the strongest confidence-probability alignment.
arXiv Detail & Related papers (2024-05-25T15:42:04Z) - A Diachronic Perspective on User Trust in AI under Uncertainty [52.44939679369428]
Modern NLP systems are often uncalibrated, resulting in confidently incorrect predictions that undermine user trust.
We study the evolution of user trust in response to trust-eroding events using a betting game.
arXiv Detail & Related papers (2023-10-20T14:41:46Z) - TrustGuard: GNN-based Robust and Explainable Trust Evaluation with
Dynamicity Support [59.41529066449414]
We propose TrustGuard, a GNN-based accurate trust evaluation model that supports trust dynamicity.
TrustGuard is designed with a layered architecture that contains a snapshot input layer, a spatial aggregation layer, a temporal aggregation layer, and a prediction layer.
Experiments show that TrustGuard outperforms state-of-the-art GNN-based trust evaluation models with respect to trust prediction across single-timeslot and multi-timeslot.
arXiv Detail & Related papers (2023-06-23T07:39:12Z) - Explaining Model Confidence Using Counterfactuals [4.385390451313721]
Displaying confidence scores in human-AI interaction has been shown to help build trust between humans and AI systems.
Most existing research uses only the confidence score as a form of communication.
We show that counterfactual explanations of confidence scores help study participants to better understand and better trust a machine learning model's prediction.
arXiv Detail & Related papers (2023-03-10T06:22:13Z) - UKP-SQuARE v2 Explainability and Adversarial Attacks for Trustworthy QA [47.8796570442486]
Question Answering systems are increasingly deployed in applications where they support real-world decisions.
Inherently interpretable models or post hoc explainability methods can help users to comprehend how a model arrives at its prediction.
We introduce SQuARE v2, the new version of SQuARE, to provide an explainability infrastructure for comparing models.
arXiv Detail & Related papers (2022-08-19T13:01:01Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - Trust in Human-AI Interaction: Scoping Out Models, Measures, and Methods [12.641141743223377]
Trust has emerged as a key factor in people's interactions with AI-infused systems.
Little is known about what models of trust have been used and for what systems.
There is yet no known standard approach to measuring trust in AI.
arXiv Detail & Related papers (2022-04-30T07:34:19Z) - Trust in AI: Interpretability is not necessary or sufficient, while
black-box interaction is necessary and sufficient [0.0]
The problem of human trust in artificial intelligence is one of the most fundamental problems in applied machine learning.
We draw from statistical learning theory and sociological lenses on human-automation trust to motivate an AI-as-tool framework.
We clarify the role of interpretability in trust with a ladder of model access.
arXiv Detail & Related papers (2022-02-10T19:59:23Z) - Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and
Goals of Human Trust in AI [55.4046755826066]
We discuss a model of trust inspired by, but not identical to, sociology's interpersonal trust (i.e., trust between people)
We incorporate a formalization of 'contractual trust', such that trust between a user and an AI is trust that some implicit or explicit contract will hold.
We discuss how to design trustworthy AI, how to evaluate whether trust has manifested, and whether it is warranted.
arXiv Detail & Related papers (2020-10-15T03:07:23Z) - Binary Classification from Positive Data with Skewed Confidence [85.18941440826309]
Positive-confidence (Pconf) classification is a promising weakly-supervised learning method.
In practice, the confidence may be skewed by bias arising in an annotation process.
We introduce the parameterized model of the skewed confidence, and propose the method for selecting the hyper parameter.
arXiv Detail & Related papers (2020-01-29T00:04:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.