CHILLI: A data context-aware perturbation method for XAI
- URL: http://arxiv.org/abs/2407.07521v1
- Date: Wed, 10 Jul 2024 10:18:07 GMT
- Title: CHILLI: A data context-aware perturbation method for XAI
- Authors: Saif Anwar, Nathan Griffiths, Abhir Bhalerao, Thomas Popham,
- Abstract summary: The trustworthiness of Machine Learning (ML) models can be difficult to assess, but is critical in high-risk or ethically sensitive applications.
We propose a novel framework, CHILLI, for incorporating data context into XAI by generating contextually aware perturbations.
This is shown to improve both the soundness and accuracy of the explanations.
- Score: 3.587367153279351
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The trustworthiness of Machine Learning (ML) models can be difficult to assess, but is critical in high-risk or ethically sensitive applications. Many models are treated as a `black-box' where the reasoning or criteria for a final decision is opaque to the user. To address this, some existing Explainable AI (XAI) approaches approximate model behaviour using perturbed data. However, such methods have been criticised for ignoring feature dependencies, with explanations being based on potentially unrealistic data. We propose a novel framework, CHILLI, for incorporating data context into XAI by generating contextually aware perturbations, which are faithful to the training data of the base model being explained. This is shown to improve both the soundness and accuracy of the explanations.
Related papers
- F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI [15.314388210699443]
Fine-tuned Fidelity F-Fidelity is a robust evaluation framework for XAI.
We show that F-Fidelity significantly improves upon prior evaluation metrics in recovering the ground-truth ranking of explainers.
We also show that given a faithful explainer, F-Fidelity metric can be used to compute the sparsity of influential input components.
arXiv Detail & Related papers (2024-10-03T20:23:06Z) - Robustness of Explainable Artificial Intelligence in Industrial Process Modelling [43.388607981317016]
We evaluate current XAI methods by scoring them based on ground truth simulations and sensitivity analysis.
We show the differences between XAI methods in their ability to correctly predict the true sensitivity of the modeled industrial process.
arXiv Detail & Related papers (2024-07-12T09:46:26Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - R-Tuning: Instructing Large Language Models to Say `I Don't Know' [66.11375475253007]
Large language models (LLMs) have revolutionized numerous domains with their impressive performance but still face their challenges.
Previous instruction tuning methods force the model to complete a sentence no matter whether the model knows the knowledge or not.
We present a new approach called Refusal-Aware Instruction Tuning (R-Tuning)
Experimental results demonstrate R-Tuning effectively improves a model's ability to answer known questions and refrain from answering unknown questions.
arXiv Detail & Related papers (2023-11-16T08:45:44Z) - CLIMAX: An exploration of Classifier-Based Contrastive Explanations [5.381004207943597]
We propose a novel post-hoc model XAI technique that provides contrastive explanations justifying the classification of a black box.
Our method, which we refer to as CLIMAX, is based on local classifiers.
We show that we achieve better consistency as compared to baselines such as LIME, BayLIME, and SLIME.
arXiv Detail & Related papers (2023-07-02T22:52:58Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - AUTOLYCUS: Exploiting Explainable AI (XAI) for Model Extraction Attacks against Interpretable Models [1.8752655643513647]
XAI tools can increase the vulnerability of model extraction attacks, which is a concern when model owners prefer black-box access.
We propose a novel retraining (learning) based model extraction attack framework against interpretable models under black-box settings.
We show that AUTOLYCUS is highly effective, requiring significantly fewer queries compared to state-of-the-art attacks.
arXiv Detail & Related papers (2023-02-04T13:23:39Z) - Optimizing Explanations by Network Canonization and Hyperparameter
Search [74.76732413972005]
Rule-based and modified backpropagation XAI approaches often face challenges when being applied to modern model architectures.
Model canonization is the process of re-structuring the model to disregard problematic components without changing the underlying function.
In this work, we propose canonizations for currently relevant model blocks applicable to popular deep neural network architectures.
arXiv Detail & Related papers (2022-11-30T17:17:55Z) - Greybox XAI: a Neural-Symbolic learning framework to produce
interpretable predictions for image classification [6.940242990198]
Greybox XAI is a framework that composes a DNN and a transparent model thanks to the use of a symbolic Knowledge Base (KB)
We address the problem of the lack of universal criteria for XAI by formalizing what an explanation is.
We show how this new architecture is accurate and explainable in several datasets.
arXiv Detail & Related papers (2022-09-26T08:55:31Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Understanding and Improving Lexical Choice in Non-Autoregressive
Translation [98.11249019844281]
We propose to expose the raw data to NAT models to restore the useful information of low-frequency words.
Our approach pushes the SOTA NAT performance on the WMT14 English-German and WMT16 Romanian-English datasets up to 27.8 and 33.8 BLEU points, respectively.
arXiv Detail & Related papers (2020-12-29T03:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.