Fooling Partial Dependence via Data Poisoning
- URL: http://arxiv.org/abs/2105.12837v1
- Date: Wed, 26 May 2021 20:58:04 GMT
- Title: Fooling Partial Dependence via Data Poisoning
- Authors: Hubert Baniecki, Wojciech Kretowicz, Przemyslaw Biecek
- Abstract summary: We present techniques for attacking Partial Dependence (plots, profiles, PDP)
We showcase that PD can be manipulated in an adversarial manner, which is alarming, especially in financial or medical applications.
The fooling is performed via poisoning the data to bend and shift explanations in the desired direction using genetic gradient and algorithms.
- Score: 3.0036519884678894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many methods have been developed to understand complex predictive models and
high expectations are placed on post-hoc model explainability. It turns out
that such explanations are not robust nor trustworthy, and they can be fooled.
This paper presents techniques for attacking Partial Dependence (plots,
profiles, PDP), which are among the most popular methods of explaining any
predictive model trained on tabular data. We showcase that PD can be
manipulated in an adversarial manner, which is alarming, especially in
financial or medical applications where auditability became a must-have trait
supporting black-box models. The fooling is performed via poisoning the data to
bend and shift explanations in the desired direction using genetic and gradient
algorithms. To the best of our knowledge, this is the first work performing
attacks on variable dependence explanations. The novel approach of using a
genetic algorithm for doing so is highly transferable as it generalizes both
ways: in a model-agnostic and an explanation-agnostic manner.
Related papers
- Are Data-driven Explanations Robust against Out-of-distribution Data? [18.760475318852375]
We propose an end-to-end model-agnostic learning framework Distributionally Robust Explanations (DRE)
Key idea is to fully utilize the inter-distribution information to provide supervisory signals for the learning of explanations without human annotation.
Our results demonstrate that the proposed method significantly improves the model's performance in terms of explanation and prediction robustness against distributional shifts.
arXiv Detail & Related papers (2023-03-29T02:02:08Z) - CLEAR: Generative Counterfactual Explanations on Graphs [60.30009215290265]
We study the problem of counterfactual explanation generation on graphs.
A few studies have explored counterfactual explanations on graphs, but many challenges of this problem are still not well-addressed.
We propose a novel framework CLEAR which aims to generate counterfactual explanations on graphs for graph-level prediction models.
arXiv Detail & Related papers (2022-10-16T04:35:32Z) - Principled Knowledge Extrapolation with GANs [92.62635018136476]
We study counterfactual synthesis from a new perspective of knowledge extrapolation.
We show that an adversarial game with a closed-form discriminator can be used to address the knowledge extrapolation problem.
Our method enjoys both elegant theoretical guarantees and superior performance in many scenarios.
arXiv Detail & Related papers (2022-05-21T08:39:42Z) - Discovering Invariant Rationales for Graph Neural Networks [104.61908788639052]
Intrinsic interpretability of graph neural networks (GNNs) is to find a small subset of the input graph's features.
We propose a new strategy of discovering invariant rationale (DIR) to construct intrinsically interpretable GNNs.
arXiv Detail & Related papers (2022-01-30T16:43:40Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Feature Attributions and Counterfactual Explanations Can Be Manipulated [32.579094387004346]
We show how adversaries can design biased models that manipulate model agnostic feature attribution methods.
These vulnerabilities allow an adversary to deploy a biased model, yet explanations will not reveal this bias, thereby deceiving stakeholders into trusting the model.
We evaluate the manipulations on real world data sets, including COMPAS and Communities & Crime, and find explanations can be manipulated in practice.
arXiv Detail & Related papers (2021-06-23T17:43:31Z) - Agree to Disagree: When Deep Learning Models With Identical
Architectures Produce Distinct Explanations [0.0]
We introduce a measure of explanation consistency which we use to highlight the identified problems on the MIMIC-CXR dataset.
We find explanations of identical models but with different training setups have a low consistency: $approx$ 33% on average.
We conclude that current trends in model explanation are not sufficient to mitigate the risks of deploying models in real life healthcare applications.
arXiv Detail & Related papers (2021-05-14T12:16:47Z) - Incorporating Causal Graphical Prior Knowledge into Predictive Modeling
via Simple Data Augmentation [92.96204497841032]
Causal graphs (CGs) are compact representations of the knowledge of the data generating processes behind the data distributions.
We propose a model-agnostic data augmentation method that allows us to exploit the prior knowledge of the conditional independence (CI) relations.
We experimentally show that the proposed method is effective in improving the prediction accuracy, especially in the small-data regime.
arXiv Detail & Related papers (2021-02-27T06:13:59Z) - On Generating Plausible Counterfactual and Semi-Factual Explanations for
Deep Learning [15.965337956587373]
PlausIble Exceptionality-based Contrastive Explanations (PIECE), modifies all exceptional features in a test image to be normal from the perspective of the counterfactual class.
Two controlled experiments compare PIECE to others in the literature, showing that PIECE not only generates the most plausible counterfactuals on several measures, but also the best semifactuals.
arXiv Detail & Related papers (2020-09-10T14:48:12Z) - Model extraction from counterfactual explanations [68.8204255655161]
We show how an adversary can leverage the information provided by counterfactual explanations to build high-fidelity and high-accuracy model extraction attacks.
Our attack enables the adversary to build a faithful copy of a target model by accessing its counterfactual explanations.
arXiv Detail & Related papers (2020-09-03T19:02:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.