Deceptive AI Explanations: Creation and Detection
- URL: http://arxiv.org/abs/2001.07641v3
- Date: Thu, 2 Dec 2021 13:11:55 GMT
- Title: Deceptive AI Explanations: Creation and Detection
- Authors: Johannes Schneider, Christian Meske and Michalis Vlachos
- Abstract summary: We investigate how AI models can be used to create and detect deceptive explanations.
As an empirical evaluation, we focus on text classification and alter the explanations generated by GradCAM.
We evaluate the effect of deceptive explanations on users in an experiment with 200 participants.
- Score: 3.197020142231916
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial intelligence (AI) comes with great opportunities but can also pose
significant risks. Automatically generated explanations for decisions can
increase transparency and foster trust, especially for systems based on
automated predictions by AI models. However, given, e.g., economic incentives
to create dishonest AI, to what extent can we trust explanations? To address
this issue, our work investigates how AI models (i.e., deep learning, and
existing instruments to increase transparency regarding AI decisions) can be
used to create and detect deceptive explanations. As an empirical evaluation,
we focus on text classification and alter the explanations generated by
GradCAM, a well-established explanation technique in neural networks. Then, we
evaluate the effect of deceptive explanations on users in an experiment with
200 participants. Our findings confirm that deceptive explanations can indeed
fool humans. However, one can deploy machine learning (ML) methods to detect
seemingly minor deception attempts with accuracy exceeding 80% given sufficient
domain knowledge. Without domain knowledge, one can still infer inconsistencies
in the explanations in an unsupervised manner, given basic knowledge of the
predictive model under scrutiny.
Related papers
- Towards Human Cognition Level-based Experiment Design for Counterfactual
Explanations (XAI) [68.8204255655161]
The emphasis of XAI research appears to have turned to a more pragmatic explanation approach for better understanding.
An extensive area where cognitive science research may substantially influence XAI advancements is evaluating user knowledge and feedback.
We propose a framework to experiment with generating and evaluating the explanations on the grounds of different cognitive levels of understanding.
arXiv Detail & Related papers (2022-10-31T19:20:22Z) - Evaluating Human-like Explanations for Robot Actions in Reinforcement
Learning Scenarios [1.671353192305391]
We make use of human-like explanations built from the probability of success to complete the goal that an autonomous robot shows after performing an action.
These explanations are intended to be understood by people who have no or very little experience with artificial intelligence methods.
arXiv Detail & Related papers (2022-07-07T10:40:24Z) - Cybertrust: From Explainable to Actionable and Interpretable AI (AI2) [58.981120701284816]
Actionable and Interpretable AI (AI2) will incorporate explicit quantifications and visualizations of user confidence in AI recommendations.
It will allow examining and testing of AI system predictions to establish a basis for trust in the systems' decision making.
arXiv Detail & Related papers (2022-01-26T18:53:09Z) - The Who in XAI: How AI Background Shapes Perceptions of AI Explanations [61.49776160925216]
We conduct a mixed-methods study of how two different groups--people with and without AI background--perceive different types of AI explanations.
We find that (1) both groups showed unwarranted faith in numbers for different reasons and (2) each group found value in different explanations beyond their intended design.
arXiv Detail & Related papers (2021-07-28T17:32:04Z) - A Turing Test for Transparency [0.0]
A central goal of explainable artificial intelligence (XAI) is to improve the trust relationship in human-AI interaction.
Recent empirical evidence shows that explanations can have the opposite effect.
This effect challenges the very goal of XAI and implies that responsible usage of transparent AI methods has to consider the ability of humans to distinguish machine generated from human explanations.
arXiv Detail & Related papers (2021-06-21T20:09:40Z) - Teaching the Machine to Explain Itself using Domain Knowledge [4.462334751640166]
Non-technical humans-in-the-loop struggle to comprehend the rationale behind model predictions.
We present JOEL, a neural network-based framework to jointly learn a decision-making task and associated explanations.
We collect the domain feedback from a pool of certified experts and use it to ameliorate the model (human teaching)
arXiv Detail & Related papers (2020-11-27T18:46:34Z) - Explainability in Deep Reinforcement Learning [68.8204255655161]
We review recent works in the direction to attain Explainable Reinforcement Learning (XRL)
In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box.
arXiv Detail & Related papers (2020-08-15T10:11:42Z) - Does Explainable Artificial Intelligence Improve Human Decision-Making? [17.18994675838646]
We compare and evaluate objective human decision accuracy without AI (control), with an AI prediction (no explanation) and AI prediction with explanation.
We find any kind of AI prediction tends to improve user decision accuracy, but no conclusive evidence that explainable AI has a meaningful impact.
Our results indicate that, at least in some situations, the "why" information provided in explainable AI may not enhance user decision-making.
arXiv Detail & Related papers (2020-06-19T15:46:13Z) - A general framework for scientifically inspired explanations in AI [76.48625630211943]
We instantiate the concept of structure of scientific explanation as the theoretical underpinning for a general framework in which explanations for AI systems can be implemented.
This framework aims to provide the tools to build a "mental-model" of any AI system so that the interaction with the user can provide information on demand and be closer to the nature of human-made explanations.
arXiv Detail & Related papers (2020-03-02T10:32:21Z) - Self-explaining AI as an alternative to interpretable AI [0.0]
Double descent indicates that deep neural networks operate by smoothly interpolating between data points.
Neural networks trained on complex real world data are inherently hard to interpret and prone to failure if asked to extrapolate.
Self-explaining AIs are capable of providing a human-understandable explanation along with confidence levels for both the decision and explanation.
arXiv Detail & Related papers (2020-02-12T18:50:11Z) - Explainable Active Learning (XAL): An Empirical Study of How Local
Explanations Impact Annotator Experience [76.9910678786031]
We propose a novel paradigm of explainable active learning (XAL), by introducing techniques from the recently surging field of explainable AI (XAI) into an Active Learning setting.
Our study shows benefits of AI explanation as interfaces for machine teaching--supporting trust calibration and enabling rich forms of teaching feedback, and potential drawbacks--anchoring effect with the model judgment and cognitive workload.
arXiv Detail & Related papers (2020-01-24T22:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.