On the Interaction of Belief Bias and Explanations
- URL: http://arxiv.org/abs/2106.15355v1
- Date: Tue, 29 Jun 2021 12:49:42 GMT
- Title: On the Interaction of Belief Bias and Explanations
- Authors: Ana Valeria Gonzalez, Anna Rogers, Anders S{\o}gaard
- Abstract summary: We provide an overview of belief bias, its role in human evaluation, and ideas for NLP practitioners on how to account for it.
We show that conclusions about the highest performing methods change when introducing such controls, pointing to the importance of accounting for belief bias in evaluation.
- Score: 4.211128681972148
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A myriad of explainability methods have been proposed in recent years, but
there is little consensus on how to evaluate them. While automatic metrics
allow for quick benchmarking, it isn't clear how such metrics reflect human
interaction with explanations. Human evaluation is of paramount importance, but
previous protocols fail to account for belief biases affecting human
performance, which may lead to misleading conclusions. We provide an overview
of belief bias, its role in human evaluation, and ideas for NLP practitioners
on how to account for it. For two experimental paradigms, we present a case
study of gradient-based explainability introducing simple ways to account for
humans' prior beliefs: models of varying quality and adversarial examples. We
show that conclusions about the highest performing methods change when
introducing such controls, pointing to the importance of accounting for belief
bias in evaluation.
Related papers
- Decoding Susceptibility: Modeling Misbelief to Misinformation Through a Computational Approach [61.04606493712002]
Susceptibility to misinformation describes the degree of belief in unverifiable claims that is not observable.
Existing susceptibility studies heavily rely on self-reported beliefs.
We propose a computational approach to model users' latent susceptibility levels.
arXiv Detail & Related papers (2023-11-16T07:22:56Z) - On the Definition of Appropriate Trust and the Tools that Come with it [0.0]
This paper starts with the definitions of appropriate trust from the literature.
It compares the definitions with model performance evaluation, showing the strong similarities between appropriate trust and model performance evaluation.
The paper offers several straightforward evaluation methods for different aspects of user performance, including suggesting a method for measuring uncertainty and appropriate trust in regression.
arXiv Detail & Related papers (2023-09-21T09:52:06Z) - Provable Benefits of Policy Learning from Human Preferences in
Contextual Bandit Problems [82.92678837778358]
preference-based methods have demonstrated substantial success in empirical applications such as InstructGPT.
We show how human bias and uncertainty in feedback modelings can affect the theoretical guarantees of these approaches.
arXiv Detail & Related papers (2023-07-24T17:50:24Z) - Using Natural Language Explanations to Rescale Human Judgments [81.66697572357477]
We propose a method to rescale ordinal annotations and explanations using large language models (LLMs)
We feed annotators' Likert ratings and corresponding explanations into an LLM and prompt it to produce a numeric score anchored in a scoring rubric.
Our method rescales the raw judgments without impacting agreement and brings the scores closer to human judgments grounded in the same scoring rubric.
arXiv Detail & Related papers (2023-05-24T06:19:14Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - Counterfactually Evaluating Explanations in Recommender Systems [14.938252589829673]
We propose an offline evaluation method that can be computed without human involvement.
We show that, compared to conventional methods, our method can produce evaluation scores more correlated with the real human judgments.
arXiv Detail & Related papers (2022-03-02T18:55:29Z) - Empirical Estimates on Hand Manipulation are Recoverable: A Step Towards
Individualized and Explainable Robotic Support in Everyday Activities [80.37857025201036]
Key challenge for robotic systems is to figure out the behavior of another agent.
Processing correct inferences is especially challenging when (confounding) factors are not controlled experimentally.
We propose equipping robots with the necessary tools to conduct observational studies on people.
arXiv Detail & Related papers (2022-01-27T22:15:56Z) - What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation
Framework for Explainability Methods [6.232071870655069]
We show that theoretical measures used to score explainability methods poorly reflect the practical usefulness of individual attribution methods in real-world scenarios.
Our results suggest a critical need to develop better explainability methods and to deploy human-centered evaluation approaches.
arXiv Detail & Related papers (2021-12-06T18:36:09Z) - On Quantitative Evaluations of Counterfactuals [88.42660013773647]
This paper consolidates work on evaluating visual counterfactual examples through an analysis and experiments.
We find that while most metrics behave as intended for sufficiently simple datasets, some fail to tell the difference between good and bad counterfactuals when the complexity increases.
We propose two new metrics, the Label Variation Score and the Oracle score, which are both less vulnerable to such tiny changes.
arXiv Detail & Related papers (2021-10-30T05:00:36Z) - Counterfactual Evaluation for Explainable AI [21.055319253405603]
We propose a new methodology to evaluate the faithfulness of explanations from the textitcounterfactual reasoning perspective.
We introduce two algorithms to find the proper counterfactuals in both discrete and continuous scenarios and then use the acquired counterfactuals to measure faithfulness.
arXiv Detail & Related papers (2021-09-05T01:38:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.