The Effect of Similarity Measures on Accurate Stability Estimates for Local Surrogate Models in Text-based Explainable AI
- URL: http://arxiv.org/abs/2406.15839v2
- Date: Fri, 17 Jan 2025 16:49:25 GMT
- Title: The Effect of Similarity Measures on Accurate Stability Estimates for Local Surrogate Models in Text-based Explainable AI
- Authors: Christopher Burger, Charles Walter, Thai Le,
- Abstract summary: A poor choice of similarity measure can lead to erroneous conclusions on the efficacy of an XAI method.
We investigate a variety of similarity measures designed for text-based ranked lists, including Kendall's Tau, Spearman's Footrule, and Rank-biased Overlap.
- Score: 8.23094630594374
- License:
- Abstract: Recent work has investigated the vulnerability of local surrogate methods to adversarial perturbations on a machine learning (ML) model's inputs, where the explanation is manipulated while the meaning and structure of the original input remains similar under the complex model. Although weaknesses across many methods have been shown to exist, the reasons behind why remain little explored. Central to the concept of adversarial attacks on explainable AI (XAI) is the similarity measure used to calculate how one explanation differs from another. A poor choice of similarity measure can lead to erroneous conclusions on the efficacy of an XAI method. Too sensitive a measure results in exaggerated vulnerability, while too coarse understates its weakness. We investigate a variety of similarity measures designed for text-based ranked lists, including Kendall's Tau, Spearman's Footrule, and Rank-biased Overlap to determine how substantial changes in the type of measure or threshold of success affect the conclusions generated from common adversarial attack processes. Certain measures are found to be overly sensitive, resulting in erroneous estimates of stability.
Related papers
- Improving Stability Estimates in Adversarial Explainable AI through Alternate Search Methods [0.0]
Local surrogate methods have been used to approximate the workings of complex machine learning models.
Recent work has revealed their vulnerability to adversarial attacks where the explanation produced is appreciably different.
Here we explore using an alternate search method with the goal of finding minimum viable perturbations.
arXiv Detail & Related papers (2025-01-15T18:45:05Z) - Towards Robust and Accurate Stability Estimation of Local Surrogate Models in Text-based Explainable AI [9.31572645030282]
In adversarial attacks on explainable AI (XAI) in the NLP domain, the generated explanation is manipulated.
Central to this XAI manipulation is the similarity measure used to calculate how one explanation differs from another.
This work investigates a variety of similarity measures designed for text-based ranked lists to determine their comparative suitability for use.
arXiv Detail & Related papers (2025-01-03T17:44:57Z) - Improving Robustness Estimates in Natural Language Explainable AI though Synonymity Weighted Similarity Measures [0.0]
adversarial examples have been prominent in the literature surrounding the effectiveness of XAI.
For explanations in natural language, it is natural to use measures found in the domain of information retrieval for use with ranked lists.
We show that the standard implementation of these measures are poorly suited for the comparison of explanations in adversarial XAI.
arXiv Detail & Related papers (2025-01-02T19:49:04Z) - Uncertainty in Additive Feature Attribution methods [34.80932512496311]
We focus on the class of additive feature attribution explanation methods.
We study the relationship between a feature's attribution and its uncertainty and observe little correlation.
We coin the term "stable instances" for such instances and diagnose factors that make an instance stable.
arXiv Detail & Related papers (2023-11-29T08:40:46Z) - Causal Fair Metric: Bridging Causality, Individual Fairness, and
Adversarial Robustness [7.246701762489971]
Adversarial perturbation, used to identify vulnerabilities in models, and individual fairness, aiming for equitable treatment of similar individuals, both depend on metrics to generate comparable input data instances.
Previous attempts to define such joint metrics often lack general assumptions about data or structural causal models and were unable to reflect counterfactual proximity.
This paper introduces a causal fair metric formulated based on causal structures encompassing sensitive attributes and protected causal perturbation.
arXiv Detail & Related papers (2023-10-30T09:53:42Z) - An Experimental Investigation into the Evaluation of Explainability
Methods [60.54170260771932]
This work compares 14 different metrics when applied to nine state-of-the-art XAI methods and three dummy methods (e.g., random saliency maps) used as references.
Experimental results show which of these metrics produces highly correlated results, indicating potential redundancy.
arXiv Detail & Related papers (2023-05-25T08:07:07Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Residual Error: a New Performance Measure for Adversarial Robustness [85.0371352689919]
A major challenge that limits the wide-spread adoption of deep learning has been their fragility to adversarial attacks.
This study presents the concept of residual error, a new performance measure for assessing the adversarial robustness of a deep neural network.
Experimental results using the case of image classification demonstrate the effectiveness and efficacy of the proposed residual error metric.
arXiv Detail & Related papers (2021-06-18T16:34:23Z) - Detecting Word Sense Disambiguation Biases in Machine Translation for
Model-Agnostic Adversarial Attacks [84.61578555312288]
We introduce a method for the prediction of disambiguation errors based on statistical data properties.
We develop a simple adversarial attack strategy that minimally perturbs sentences in order to elicit disambiguation errors.
Our findings indicate that disambiguation robustness varies substantially between domains and that different models trained on the same data are vulnerable to different attacks.
arXiv Detail & Related papers (2020-11-03T17:01:44Z) - Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial
Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification.
This paper studies a complementary failure mode, invariance-based adversarial examples.
We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.