Related papers: Improving Robustness Estimates in Natural Language Explainable AI though Synonymity Weighted Similarity Measures

Improving Robustness Estimates in Natural Language Explainable AI though Synonymity Weighted Similarity Measures

URL: http://arxiv.org/abs/2501.01516v1
Date: Thu, 02 Jan 2025 19:49:04 GMT
Title: Improving Robustness Estimates in Natural Language Explainable AI though Synonymity Weighted Similarity Measures
Authors: Christopher Burger,
Abstract summary: adversarial examples have been prominent in the literature surrounding the effectiveness of XAI.<n>For explanations in natural language, it is natural to use measures found in the domain of information retrieval for use with ranked lists.<n>We show that the standard implementation of these measures are poorly suited for the comparison of explanations in adversarial XAI.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Explainable AI (XAI) has seen a surge in recent interest with the proliferation of powerful but intractable black-box models. Moreover, XAI has come under fire for techniques that may not offer reliable explanations. As many of the methods in XAI are themselves models, adversarial examples have been prominent in the literature surrounding the effectiveness of XAI, with the objective of these examples being to alter the explanation while maintaining the output of the original model. For explanations in natural language, it is natural to use measures found in the domain of information retrieval for use with ranked lists to guide the adversarial XAI process. We show that the standard implementation of these measures are poorly suited for the comparison of explanations in adversarial XAI and amend them by using information that is discarded, the synonymity of perturbed words. This synonymity weighting produces more accurate estimates of the actual weakness of XAI methods to adversarial examples.

Related papers

ForenX: Towards Explainable AI-Generated Image Detection with Multimodal Large Language Models [82.04858317800097]
We present ForenX, a novel method that not only identifies the authenticity of images but also provides explanations that resonate with human thoughts.<n>ForenX employs the powerful multimodal large language models (MLLMs) to analyze and interpret forensic cues.<n>We introduce ForgReason, a dataset dedicated to descriptions of forgery evidences in AI-generated images.
arXiv Detail & Related papers (2025-08-02T15:21:26Z)
Explainable AI-Based Interface System for Weather Forecasting Model [21.801445160287532]
This study defines three requirements for explanations of black-box models in meteorology through user studies. Appropriate XAI methods are mapped to each requirement, and the generated explanations are tested quantitatively and qualitatively. Results indicate that the explanations increase decision utility and user trust.
arXiv Detail & Related papers (2025-04-01T13:52:34Z)
Towards Robust and Accurate Stability Estimation of Local Surrogate Models in Text-based Explainable AI [9.31572645030282]
In adversarial attacks on explainable AI (XAI) in the NLP domain, the generated explanation is manipulated. Central to this XAI manipulation is the similarity measure used to calculate how one explanation differs from another. This work investigates a variety of similarity measures designed for text-based ranked lists to determine their comparative suitability for use.
arXiv Detail & Related papers (2025-01-03T17:44:57Z)
F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI [15.314388210699443]
XAI techniques can extract meaningful insights from deep learning models. How to properly evaluate them remains an open problem. We propose Fine-tuned Fidelity (F-Fidelity) as a robust evaluation framework for XAI.
arXiv Detail & Related papers (2024-10-03T20:23:06Z)
Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers. We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z)
Assessing Fidelity in XAI post-hoc techniques: A Comparative Study with Ground Truth Explanations Datasets [0.0]
XAI methods based on the backpropagation of output information to input yield higher accuracy and reliability. Backpropagation method tends to generate more noisy saliency maps. Findings have significant implications for the advancement of XAI methods.
arXiv Detail & Related papers (2023-11-03T14:57:24Z)
Explaining Explainability: Towards Deeper Actionable Insights into Deep Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level. We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z)
Adversarial attacks and defenses in explainable artificial intelligence: A survey [11.541601343587917]
Recent advances in adversarial machine learning (AdvML) highlight the limitations and vulnerabilities of state-of-the-art explanation methods. This survey provides a comprehensive overview of research concerning adversarial attacks on explanations of machine learning models.
arXiv Detail & Related papers (2023-06-06T09:53:39Z)
Theoretical Behavior of XAI Methods in the Presence of Suppressor Variables [0.8602553195689513]
In recent years, the community of 'explainable artificial intelligence' (XAI) has created a vast body of methods to bridge a perceived gap between model 'complexity' and 'interpretability' We show that the majority of the studied approaches will attribute non-zero importance to a non-class-related suppressor feature in the presence of correlated noise.
arXiv Detail & Related papers (2023-06-02T11:41:19Z)
An Experimental Investigation into the Evaluation of Explainability Methods [60.54170260771932]
This work compares 14 different metrics when applied to nine state-of-the-art XAI methods and three dummy methods (e.g., random saliency maps) used as references. Experimental results show which of these metrics produces highly correlated results, indicating potential redundancy.
arXiv Detail & Related papers (2023-05-25T08:07:07Z)
A Perspective on Explainable Artificial Intelligence Methods: SHAP and LIME [4.328967621024592]
We propose a framework for interpretation of two widely used XAI methods. We discuss their outcomes in terms of model-dependency and in the presence of collinearity. The results indicate that SHAP and LIME are highly affected by the adopted ML model and feature collinearity, raising a note of caution on their usage and interpretation.
arXiv Detail & Related papers (2023-05-03T10:04:46Z)
Connecting Algorithmic Research and Usage Contexts: A Perspective of Contextualized Evaluation for Explainable AI [65.44737844681256]
A lack of consensus on how to evaluate explainable AI (XAI) hinders the advancement of the field. We argue that one way to close the gap is to develop evaluation methods that account for different user requirements.
arXiv Detail & Related papers (2022-06-22T05:17:33Z)
Beyond Explaining: Opportunities and Challenges of XAI-Based Model Improvement [75.00655434905417]
Explainable Artificial Intelligence (XAI) is an emerging research field bringing transparency to highly complex machine learning (ML) models. This paper offers a comprehensive overview over techniques that apply XAI practically for improving various properties of ML models. We show empirically through experiments on toy and realistic settings how explanations can help improve properties such as model generalization ability or reasoning.
arXiv Detail & Related papers (2022-03-15T15:44:28Z)
Contextualized Perturbation for Textual Adversarial Attack [56.370304308573274]
Adversarial examples expose the vulnerabilities of natural language processing (NLP) models. This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs.
arXiv Detail & Related papers (2020-09-16T06:53:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.