Related papers: What Do You See? Evaluation of Explainable Artificial Intelligence (XAI) Interpretability through Neural Backdoors

What Do You See? Evaluation of Explainable Artificial Intelligence (XAI) Interpretability through Neural Backdoors

URL: http://arxiv.org/abs/2009.10639v1
Date: Tue, 22 Sep 2020 15:53:19 GMT
Title: What Do You See? Evaluation of Explainable Artificial Intelligence (XAI) Interpretability through Neural Backdoors
Authors: Yi-Shan Lin, Wen-Chuan Lee, Z. Berkay Celik
Abstract summary: EXplainable AI (XAI) methods have been proposed to interpret how a deep neural network predicts inputs. Current evaluation approaches either require subjective input from humans or incur high computation cost with automated evaluation. We propose backdoor trigger patterns--hidden malicious functionalities that cause misclassification--to automate the evaluation of saliency explanations.
Score: 15.211935029680879
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: EXplainable AI (XAI) methods have been proposed to interpret how a deep neural network predicts inputs through model saliency explanations that highlight the parts of the inputs deemed important to arrive a decision at a specific target. However, it remains challenging to quantify correctness of their interpretability as current evaluation approaches either require subjective input from humans or incur high computation cost with automated evaluation. In this paper, we propose backdoor trigger patterns--hidden malicious functionalities that cause misclassification--to automate the evaluation of saliency explanations. Our key observation is that triggers provide ground truth for inputs to evaluate whether the regions identified by an XAI method are truly relevant to its output. Since backdoor triggers are the most important features that cause deliberate misclassification, a robust XAI method should reveal their presence at inference time. We introduce three complementary metrics for systematic evaluation of explanations that an XAI method generates and evaluate seven state-of-the-art model-free and model-specific posthoc methods through 36 models trojaned with specifically crafted triggers using color, shape, texture, location, and size. We discovered six methods that use local explanation and feature relevance fail to completely highlight trigger regions, and only a model-free approach can uncover the entire trigger region.

Related papers

ODExAI: A Comprehensive Object Detection Explainable AI Evaluation [1.338174941551702]
We introduce the Object Detection Explainable AI Evaluation (ODExAI) to assess XAI methods in object detection. We benchmark a set of XAI methods across two widely used object detectors and standard datasets.
arXiv Detail & Related papers (2025-04-27T14:16:14Z)
Effort: Efficient Orthogonal Modeling for Generalizable AI-Generated Image Detection [66.16595174895802]
Existing AI-generated image (AIGI) detection methods often suffer from limited generalization performance. In this paper, we identify a crucial yet previously overlooked asymmetry phenomenon in AIGI detection.
arXiv Detail & Related papers (2024-11-23T19:10:32Z)
Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance. Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z)
F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI [15.314388210699443]
XAI techniques can extract meaningful insights from deep learning models. How to properly evaluate them remains an open problem. We propose Fine-tuned Fidelity (F-Fidelity) as a robust evaluation framework for XAI.
arXiv Detail & Related papers (2024-10-03T20:23:06Z)
Explainable AI needs formal notions of explanation correctness [2.1309989863595677]
Machine learning in critical domains such as medicine poses risks and requires regulation. One requirement is that decisions of ML systems in high-risk applications should be human-understandable. In its current form, XAI is unfit to provide quality control for ML; it itself needs scrutiny.
arXiv Detail & Related papers (2024-09-22T20:47:04Z)
T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models [70.03122709795122]
We propose a comprehensive defense method named T2IShield to detect, localize, and mitigate backdoor attacks. We find the "Assimilation Phenomenon" on the cross-attention maps caused by the backdoor trigger. For backdoor sample detection, T2IShield achieves a detection F1 score of 88.9$%$ with low computational cost.
arXiv Detail & Related papers (2024-07-05T01:53:21Z)
Model X-ray:Detecting Backdoored Models via Decision Boundary [62.675297418960355]
Backdoor attacks pose a significant security vulnerability for deep neural networks (DNNs) We propose Model X-ray, a novel backdoor detection approach based on the analysis of illustrated two-dimensional (2D) decision boundaries. Our approach includes two strategies focused on the decision areas dominated by clean samples and the concentration of label distribution.
arXiv Detail & Related papers (2024-02-27T12:42:07Z)
FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis of Explainable AI Methods [15.073405675079558]
XAI inherently lacks ground-truth explanations, making its automatic evaluation an unsolved problem. We propose a novel synthetic vision dataset, named FunnyBirds, and accompanying automatic evaluation protocols. Using our tools, we report results for 24 different combinations of neural models and XAI methods.
arXiv Detail & Related papers (2023-08-11T17:29:02Z)
Using Kernel SHAP XAI Method to optimize the Network Anomaly Detection Model [0.0]
Anomaly detection and its explanation is important in many research areas such as intrusion detection, fraud detection, unknown attack detection in network traffic and logs. It is challenging to identify the cause or explanation of why one instance is an anomaly? XAI provides tools and techniques to interpret and explain the output and working of complex models such as Deep Learning (DL)
arXiv Detail & Related papers (2023-07-31T18:47:45Z)
Neural Causal Models for Counterfactual Identification and Estimation [62.30444687707919]
We study the evaluation of counterfactual statements through neural models. First, we show that neural causal models (NCMs) are expressive enough. Second, we develop an algorithm for simultaneously identifying and estimating counterfactual distributions.
arXiv Detail & Related papers (2022-09-30T18:29:09Z)
Feature Visualization within an Automated Design Assessment leveraging Explainable Artificial Intelligence Methods [0.0]
Automated capability assessment, mainly leveraged by deep learning systems driven from 3D CAD data, have been presented. Current assessment systems may be able to assess CAD data with regards to abstract features, but without any geometrical indicator about the reasons of the system's decision. Within the NeuroCAD Project, xAI methods are used to identify geometrical features which are associated with a certain abstract feature.
arXiv Detail & Related papers (2022-01-28T13:31:42Z)
Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews. We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z)
Towards Better Model Understanding with Path-Sufficient Explanations [11.517059323883444]
Path-Sufficient Explanations Method (PSEM) is a sequence of sufficient explanations for a given input of strictly decreasing size. PSEM can be thought to trace the local boundary of the model in a smooth manner, thus providing better intuition about the local model behavior for the specific input. A user study depicts the strength of the method in communicating the local behavior, where (many) users are able to correctly determine the prediction made by a model.
arXiv Detail & Related papers (2021-09-13T16:06:10Z)
Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch. We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types. In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.