Related papers: Can we Agree? On the Rash\=omon Effect and the Reliability of Post-Hoc Explainable AI

Can we Agree? On the Rash\=omon Effect and the Reliability of Post-Hoc Explainable AI

URL: http://arxiv.org/abs/2308.07247v1
Date: Mon, 14 Aug 2023 16:32:24 GMT
Title: Can we Agree? On the Rash\=omon Effect and the Reliability of Post-Hoc Explainable AI
Authors: Clement Poiret, Antoine Grigis, Justin Thomas, Marion Noulhiane
Abstract summary: The Rash=omon effect poses challenges for deriving reliable knowledge from machine learning models. This study examined the influence of sample size on explanations from models in a Rash=omon set using SHAP.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The Rash\=omon effect poses challenges for deriving reliable knowledge from machine learning models. This study examined the influence of sample size on explanations from models in a Rash\=omon set using SHAP. Experiments on 5 public datasets showed that explanations gradually converged as the sample size increased. Explanations from <128 samples exhibited high variability, limiting reliable knowledge extraction. However, agreement between models improved with more data, allowing for consensus. Bagging ensembles often had higher agreement. The results provide guidance on sufficient data to trust explanations. Variability at low samples suggests that conclusions may be unreliable without validation. Further work is needed with more model types, data domains, and explanation methods. Testing convergence in neural networks and with model-specific explanation methods would be impactful. The approaches explored here point towards principled techniques for eliciting knowledge from ambiguous models.

Related papers

Understanding Disparities in Post Hoc Machine Learning Explanation [2.965442487094603]
Previous work has highlighted that existing post-hoc explanation methods exhibit disparities in explanation fidelity (across 'race' and 'gender' as sensitive attributes) We specifically assess challenges to explanation disparities that originate from properties of the data. Results indicate that disparities in model explanations can also depend on data and model properties.
arXiv Detail & Related papers (2024-01-25T22:09:28Z)
Are Data-driven Explanations Robust against Out-of-distribution Data? [18.760475318852375]
We propose an end-to-end model-agnostic learning framework Distributionally Robust Explanations (DRE) Key idea is to fully utilize the inter-distribution information to provide supervisory signals for the learning of explanations without human annotation. Our results demonstrate that the proposed method significantly improves the model's performance in terms of explanation and prediction robustness against distributional shifts.
arXiv Detail & Related papers (2023-03-29T02:02:08Z)
Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews. We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z)
Agree to Disagree: When Deep Learning Models With Identical Architectures Produce Distinct Explanations [0.0]
We introduce a measure of explanation consistency which we use to highlight the identified problems on the MIMIC-CXR dataset. We find explanations of identical models but with different training setups have a low consistency: $approx$ 33% on average. We conclude that current trends in model explanation are not sufficient to mitigate the risks of deploying models in real life healthcare applications.
arXiv Detail & Related papers (2021-05-14T12:16:47Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
Hard-label Manifolds: Unexpected Advantages of Query Efficiency for Finding On-manifold Adversarial Examples [67.23103682776049]
Recent zeroth order hard-label attacks on image classification models have shown comparable performance to their first-order, gradient-level alternatives. It was recently shown in the gradient-level setting that regular adversarial examples leave the data manifold, while their on-manifold counterparts are in fact generalization errors. We propose an information-theoretic argument based on a noisy manifold distance oracle, which leaks manifold information through the adversary's gradient estimate.
arXiv Detail & Related papers (2021-03-04T20:53:06Z)
Debugging Tests for Model Explanations [18.073554618753395]
Methods tested are able to diagnose a spurious background bug, but not conclusively identify mislabeled training examples. We complement our analysis with a human subject study, and find that subjects fail to identify defective models using attributions, but instead rely, primarily, on model predictions.
arXiv Detail & Related papers (2020-11-10T22:23:25Z)
Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented. It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts. Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z)
Model extraction from counterfactual explanations [68.8204255655161]
We show how an adversary can leverage the information provided by counterfactual explanations to build high-fidelity and high-accuracy model extraction attacks. Our attack enables the adversary to build a faithful copy of a target model by accessing its counterfactual explanations.
arXiv Detail & Related papers (2020-09-03T19:02:55Z)
Are Visual Explanations Useful? A Case Study in Model-in-the-Loop Prediction [49.254162397086006]
We study explanations based on visual saliency in an image-based age prediction task. We find that presenting model predictions improves human accuracy. However, explanations of various kinds fail to significantly alter human accuracy or trust in the model.
arXiv Detail & Related papers (2020-07-23T20:39:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.