What Do You See? Evaluation of Explainable Artificial Intelligence (XAI)
Interpretability through Neural Backdoors
- URL: http://arxiv.org/abs/2009.10639v1
- Date: Tue, 22 Sep 2020 15:53:19 GMT
- Title: What Do You See? Evaluation of Explainable Artificial Intelligence (XAI)
Interpretability through Neural Backdoors
- Authors: Yi-Shan Lin, Wen-Chuan Lee, Z. Berkay Celik
- Abstract summary: EXplainable AI (XAI) methods have been proposed to interpret how a deep neural network predicts inputs.
Current evaluation approaches either require subjective input from humans or incur high computation cost with automated evaluation.
We propose backdoor trigger patterns--hidden malicious functionalities that cause misclassification--to automate the evaluation of saliency explanations.
- Score: 15.211935029680879
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: EXplainable AI (XAI) methods have been proposed to interpret how a deep
neural network predicts inputs through model saliency explanations that
highlight the parts of the inputs deemed important to arrive a decision at a
specific target. However, it remains challenging to quantify correctness of
their interpretability as current evaluation approaches either require
subjective input from humans or incur high computation cost with automated
evaluation. In this paper, we propose backdoor trigger patterns--hidden
malicious functionalities that cause misclassification--to automate the
evaluation of saliency explanations. Our key observation is that triggers
provide ground truth for inputs to evaluate whether the regions identified by
an XAI method are truly relevant to its output. Since backdoor triggers are the
most important features that cause deliberate misclassification, a robust XAI
method should reveal their presence at inference time. We introduce three
complementary metrics for systematic evaluation of explanations that an XAI
method generates and evaluate seven state-of-the-art model-free and
model-specific posthoc methods through 36 models trojaned with specifically
crafted triggers using color, shape, texture, location, and size. We discovered
six methods that use local explanation and feature relevance fail to completely
highlight trigger regions, and only a model-free approach can uncover the
entire trigger region.
Related papers
- Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance.
Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z) - Explainable AI needs formal notions of explanation correctness [2.1309989863595677]
Machine learning in critical domains such as medicine poses risks and requires regulation.
One requirement is that decisions of ML systems in high-risk applications should be human-understandable.
In its current form, XAI is unfit to provide quality control for ML; it itself needs scrutiny.
arXiv Detail & Related papers (2024-09-22T20:47:04Z) - T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models [70.03122709795122]
We propose a comprehensive defense method named T2IShield to detect, localize, and mitigate backdoor attacks.
We find the "Assimilation Phenomenon" on the cross-attention maps caused by the backdoor trigger.
For backdoor sample detection, T2IShield achieves a detection F1 score of 88.9$%$ with low computational cost.
arXiv Detail & Related papers (2024-07-05T01:53:21Z) - Model X-ray:Detecting Backdoored Models via Decision Boundary [62.675297418960355]
Backdoor attacks pose a significant security vulnerability for deep neural networks (DNNs)
We propose Model X-ray, a novel backdoor detection approach based on the analysis of illustrated two-dimensional (2D) decision boundaries.
Our approach includes two strategies focused on the decision areas dominated by clean samples and the concentration of label distribution.
arXiv Detail & Related papers (2024-02-27T12:42:07Z) - FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis of
Explainable AI Methods [15.073405675079558]
XAI inherently lacks ground-truth explanations, making its automatic evaluation an unsolved problem.
We propose a novel synthetic vision dataset, named FunnyBirds, and accompanying automatic evaluation protocols.
Using our tools, we report results for 24 different combinations of neural models and XAI methods.
arXiv Detail & Related papers (2023-08-11T17:29:02Z) - Using Kernel SHAP XAI Method to optimize the Network Anomaly Detection
Model [0.0]
Anomaly detection and its explanation is important in many research areas such as intrusion detection, fraud detection, unknown attack detection in network traffic and logs.
It is challenging to identify the cause or explanation of why one instance is an anomaly?
XAI provides tools and techniques to interpret and explain the output and working of complex models such as Deep Learning (DL)
arXiv Detail & Related papers (2023-07-31T18:47:45Z) - Neural Causal Models for Counterfactual Identification and Estimation [62.30444687707919]
We study the evaluation of counterfactual statements through neural models.
First, we show that neural causal models (NCMs) are expressive enough.
Second, we develop an algorithm for simultaneously identifying and estimating counterfactual distributions.
arXiv Detail & Related papers (2022-09-30T18:29:09Z) - Feature Visualization within an Automated Design Assessment leveraging
Explainable Artificial Intelligence Methods [0.0]
Automated capability assessment, mainly leveraged by deep learning systems driven from 3D CAD data, have been presented.
Current assessment systems may be able to assess CAD data with regards to abstract features, but without any geometrical indicator about the reasons of the system's decision.
Within the NeuroCAD Project, xAI methods are used to identify geometrical features which are associated with a certain abstract feature.
arXiv Detail & Related papers (2022-01-28T13:31:42Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Towards Better Model Understanding with Path-Sufficient Explanations [11.517059323883444]
Path-Sufficient Explanations Method (PSEM) is a sequence of sufficient explanations for a given input of strictly decreasing size.
PSEM can be thought to trace the local boundary of the model in a smooth manner, thus providing better intuition about the local model behavior for the specific input.
A user study depicts the strength of the method in communicating the local behavior, where (many) users are able to correctly determine the prediction made by a model.
arXiv Detail & Related papers (2021-09-13T16:06:10Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.