Better sampling in explanation methods can prevent dieselgate-like
deception
- URL: http://arxiv.org/abs/2101.11702v1
- Date: Tue, 26 Jan 2021 13:41:37 GMT
- Title: Better sampling in explanation methods can prevent dieselgate-like
deception
- Authors: Domen Vre\v{s} and Marko Robnik \v{S}ikonja
- Abstract summary: Interpretability of prediction models is necessary to determine their biases and causes of errors.
Popular techniques, such as IME, LIME, and SHAP, use perturbation of instance features to explain individual predictions.
We show that the improved sampling increases the robustness of the LIME and SHAP, while previously untested method IME is already the most robust of all.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning models are used in many sensitive areas where besides
predictive accuracy their comprehensibility is also important. Interpretability
of prediction models is necessary to determine their biases and causes of
errors, and is a necessary prerequisite for users' confidence. For complex
state-of-the-art black-box models post-hoc model-independent explanation
techniques are an established solution. Popular and effective techniques, such
as IME, LIME, and SHAP, use perturbation of instance features to explain
individual predictions. Recently, Slack et al. (2020) put their robustness into
question by showing that their outcomes can be manipulated due to poor
perturbation sampling employed. This weakness would allow dieselgate type
cheating of owners of sensitive models who could deceive inspection and hide
potentially unethical or illegal biases existing in their predictive models.
This could undermine public trust in machine learning models and give rise to
legal restrictions on their use.
We show that better sampling in these explanation methods prevents malicious
manipulations. The proposed sampling uses data generators that learn the
training set distribution and generate new perturbation instances much more
similar to the training set. We show that the improved sampling increases the
robustness of the LIME and SHAP, while previously untested method IME is
already the most robust of all.
Related papers
- Selective Learning: Towards Robust Calibration with Dynamic Regularization [79.92633587914659]
Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance.
We introduce Dynamic Regularization (DReg) which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off.
arXiv Detail & Related papers (2024-02-13T11:25:20Z) - Source-Free Unsupervised Domain Adaptation with Hypothesis Consolidation
of Prediction Rationale [53.152460508207184]
Source-Free Unsupervised Domain Adaptation (SFUDA) is a challenging task where a model needs to be adapted to a new domain without access to target domain labels or source domain data.
This paper proposes a novel approach that considers multiple prediction hypotheses for each sample and investigates the rationale behind each hypothesis.
To achieve the optimal performance, we propose a three-step adaptation process: model pre-adaptation, hypothesis consolidation, and semi-supervised learning.
arXiv Detail & Related papers (2024-02-02T05:53:22Z) - Learning Sample Difficulty from Pre-trained Models for Reliable
Prediction [55.77136037458667]
We propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization.
We simultaneously improve accuracy and uncertainty calibration across challenging benchmarks.
arXiv Detail & Related papers (2023-04-20T07:29:23Z) - Increasing the Cost of Model Extraction with Calibrated Proof of Work [25.096196576476885]
In model extraction attacks, adversaries can steal a machine learning model exposed via a public API.
We propose requiring users to complete a proof-of-work before they can read the model's predictions.
arXiv Detail & Related papers (2022-01-23T12:21:28Z) - Robust uncertainty estimates with out-of-distribution pseudo-inputs
training [0.0]
We propose to explicitly train the uncertainty predictor where we are not given data to make it reliable.
As one cannot train without data, we provide mechanisms for generating pseudo-inputs in informative low-density regions of the input space.
With a holistic evaluation, we demonstrate that this yields robust and interpretable predictions of uncertainty while retaining state-of-the-art performance on diverse tasks.
arXiv Detail & Related papers (2022-01-15T17:15:07Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - Prediction Confidence from Neighbors [0.0]
The inability of Machine Learning (ML) models to successfully extrapolate correct predictions from out-of-distribution (OoD) samples is a major hindrance to the application of ML in critical applications.
We show that feature space distance is a meaningful measure that can provide confidence in predictions.
This enables earlier and safer deployment of models in critical applications and is vital for deploying models under ever-changing conditions.
arXiv Detail & Related papers (2020-03-31T09:26:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.