Related papers: BELLA: Black box model Explanations by Local Linear Approximations

BELLA: Black box model Explanations by Local Linear Approximations

URL: http://arxiv.org/abs/2305.11311v1
Date: Thu, 18 May 2023 21:22:23 GMT
Title: BELLA: Black box model Explanations by Local Linear Approximations
Authors: Nedeljko Radulovic, Albert Bifet, Fabian Suchanek
Abstract summary: We present BELLA, a deterministic model-agnostic post-hoc approach for explaining the individual predictions of regression black-box models. BELLA provides explanations in the form of a linear model trained in the feature space. BELLA can produce both factual and counterfactual explanations.
Score: 10.05944106581306
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years, understanding the decision-making process of black-box models has become not only a legal requirement but also an additional way to assess their performance. However, the state of the art post-hoc interpretation approaches rely on synthetic data generation. This introduces uncertainty and can hurt the reliability of the interpretations. Furthermore, they tend to produce explanations that apply to only very few data points. This makes the explanations brittle and limited in scope. Finally, they provide scores that have no direct verifiable meaning. In this paper, we present BELLA, a deterministic model-agnostic post-hoc approach for explaining the individual predictions of regression black-box models. BELLA provides explanations in the form of a linear model trained in the feature space. Thus, its coefficients can be used directly to compute the predicted value from the feature values. Furthermore, BELLA maximizes the size of the neighborhood to which the linear model applies, so that the explanations are accurate, simple, general, and robust. BELLA can produce both factual and counterfactual explanations. Our user study confirms the importance of the desiderata we optimize, and our experiments show that BELLA outperforms the state-of-the-art approaches on these desiderata.

Related papers

DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation [21.172795461188578]
We propose DISCRET, a self-interpretable ITE framework that synthesizes faithful, rule-based explanations for each sample. A key insight behind DISCRET is that explanations can serve dually as database queries to identify similar subgroups of samples. We provide a novel RL algorithm to efficiently synthesize these explanations from a large search space.
arXiv Detail & Related papers (2024-06-02T04:01:08Z)
On the Origins of Linear Representations in Large Language Models [51.88404605700344]
We introduce a simple latent variable model to formalize the concept dynamics of the next token prediction. Experiments show that linear representations emerge when learning from data matching the latent variable model. We additionally confirm some predictions of the theory using the LLaMA-2 large language model.
arXiv Detail & Related papers (2024-03-06T17:17:36Z)
Discriminative Feature Attributions: Bridging Post Hoc Explainability and Inherent Interpretability [29.459228981179674]
Post hoc explanations incorrectly attribute high importance to features that are unimportant or non-discriminative for the underlying task. Inherently interpretable models, on the other hand, circumvent these issues by explicitly encoding explanations into model architecture. We propose Distractor Erasure Tuning (DiET), a method that adapts black-box models to be robust to distractor erasure.
arXiv Detail & Related papers (2023-07-27T17:06:02Z)
Black-Box Anomaly Attribution [13.455748795087493]
When a black-box machine learning model deviates from the true observation, what can be said about the reason behind that deviation? This is a fundamental and ubiquitous question that the end user in a business or industrial AI application often asks. We propose a novel likelihood-based attribution framework we call the likelihood compensation''
arXiv Detail & Related papers (2023-05-29T01:42:32Z)
Learning with Explanation Constraints [91.23736536228485]
We provide a learning theoretic framework to analyze how explanations can improve the learning of our models. We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.
arXiv Detail & Related papers (2023-03-25T15:06:47Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models. Our method is based on projecting model representation to a latent space. Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z)
Explainers in the Wild: Making Surrogate Explainers Robust to Distortions through Perception [77.34726150561087]
We propose a methodology to evaluate the effect of distortions in explanations by embedding perceptual distances. We generate explanations for images in the Imagenet-C dataset and demonstrate how using a perceptual distances in the surrogate explainer creates more coherent explanations for the distorted and reference images.
arXiv Detail & Related papers (2021-02-22T12:38:53Z)
Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language? [86.60613602337246]
We introduce a leakage-adjusted simulatability (LAS) metric for evaluating NL explanations. LAS measures how well explanations help an observer predict a model's output, while controlling for how explanations can directly leak the output. We frame explanation generation as a multi-agent game and optimize explanations for simulatability while penalizing label leakage.
arXiv Detail & Related papers (2020-10-08T16:59:07Z)
The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models. We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z)
Accurate and Intuitive Contextual Explanations using Linear Model Trees [0.0]
Local post hoc model explanations have gained massive adoption. Current state of the art methods use rudimentary methods to generate synthetic data around the point to be explained. We use a Generative Adversarial Network for synthetic data generation and train a piecewise linear model in the form of Linear Model Trees.
arXiv Detail & Related papers (2020-09-11T10:13:12Z)
Model extraction from counterfactual explanations [68.8204255655161]
We show how an adversary can leverage the information provided by counterfactual explanations to build high-fidelity and high-accuracy model extraction attacks. Our attack enables the adversary to build a faithful copy of a target model by accessing its counterfactual explanations.
arXiv Detail & Related papers (2020-09-03T19:02:55Z)
Explainable Deep Modeling of Tabular Data using TableGraphNet [1.376408511310322]
We propose a new architecture that produces explainable predictions in the form of additive feature attributions. We show that our explainable model attains the same level of performance as black box models.
arXiv Detail & Related papers (2020-02-12T20:02:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.