BELLA: Black box model Explanations by Local Linear Approximations
- URL: http://arxiv.org/abs/2305.11311v1
- Date: Thu, 18 May 2023 21:22:23 GMT
- Title: BELLA: Black box model Explanations by Local Linear Approximations
- Authors: Nedeljko Radulovic, Albert Bifet, Fabian Suchanek
- Abstract summary: We present BELLA, a deterministic model-agnostic post-hoc approach for explaining the individual predictions of regression black-box models.
BELLA provides explanations in the form of a linear model trained in the feature space.
BELLA can produce both factual and counterfactual explanations.
- Score: 10.05944106581306
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, understanding the decision-making process of black-box
models has become not only a legal requirement but also an additional way to
assess their performance. However, the state of the art post-hoc interpretation
approaches rely on synthetic data generation. This introduces uncertainty and
can hurt the reliability of the interpretations. Furthermore, they tend to
produce explanations that apply to only very few data points. This makes the
explanations brittle and limited in scope. Finally, they provide scores that
have no direct verifiable meaning. In this paper, we present BELLA, a
deterministic model-agnostic post-hoc approach for explaining the individual
predictions of regression black-box models. BELLA provides explanations in the
form of a linear model trained in the feature space. Thus, its coefficients can
be used directly to compute the predicted value from the feature values.
Furthermore, BELLA maximizes the size of the neighborhood to which the linear
model applies, so that the explanations are accurate, simple, general, and
robust. BELLA can produce both factual and counterfactual explanations. Our
user study confirms the importance of the desiderata we optimize, and our
experiments show that BELLA outperforms the state-of-the-art approaches on
these desiderata.
Related papers
- DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation [21.172795461188578]
We propose DISCRET, a self-interpretable ITE framework that synthesizes faithful, rule-based explanations for each sample.
A key insight behind DISCRET is that explanations can serve dually as database queries to identify similar subgroups of samples.
We provide a novel RL algorithm to efficiently synthesize these explanations from a large search space.
arXiv Detail & Related papers (2024-06-02T04:01:08Z) - Discriminative Feature Attributions: Bridging Post Hoc Explainability
and Inherent Interpretability [29.459228981179674]
Post hoc explanations incorrectly attribute high importance to features that are unimportant or non-discriminative for the underlying task.
Inherently interpretable models, on the other hand, circumvent these issues by explicitly encoding explanations into model architecture.
We propose Distractor Erasure Tuning (DiET), a method that adapts black-box models to be robust to distractor erasure.
arXiv Detail & Related papers (2023-07-27T17:06:02Z) - Black-Box Anomaly Attribution [13.455748795087493]
When a black-box machine learning model deviates from the true observation, what can be said about the reason behind that deviation?
This is a fundamental and ubiquitous question that the end user in a business or industrial AI application often asks.
We propose a novel likelihood-based attribution framework we call the likelihood compensation''
arXiv Detail & Related papers (2023-05-29T01:42:32Z) - Learning with Explanation Constraints [91.23736536228485]
We provide a learning theoretic framework to analyze how explanations can improve the learning of our models.
We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.
arXiv Detail & Related papers (2023-03-25T15:06:47Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z) - Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial
Explanations of Their Behavior in Natural Language? [86.60613602337246]
We introduce a leakage-adjusted simulatability (LAS) metric for evaluating NL explanations.
LAS measures how well explanations help an observer predict a model's output, while controlling for how explanations can directly leak the output.
We frame explanation generation as a multi-agent game and optimize explanations for simulatability while penalizing label leakage.
arXiv Detail & Related papers (2020-10-08T16:59:07Z) - The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal
Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models.
We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z) - Model extraction from counterfactual explanations [68.8204255655161]
We show how an adversary can leverage the information provided by counterfactual explanations to build high-fidelity and high-accuracy model extraction attacks.
Our attack enables the adversary to build a faithful copy of a target model by accessing its counterfactual explanations.
arXiv Detail & Related papers (2020-09-03T19:02:55Z) - Explainable Deep Modeling of Tabular Data using TableGraphNet [1.376408511310322]
We propose a new architecture that produces explainable predictions in the form of additive feature attributions.
We show that our explainable model attains the same level of performance as black box models.
arXiv Detail & Related papers (2020-02-12T20:02:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.