Are Data-driven Explanations Robust against Out-of-distribution Data?
- URL: http://arxiv.org/abs/2303.16390v1
- Date: Wed, 29 Mar 2023 02:02:08 GMT
- Title: Are Data-driven Explanations Robust against Out-of-distribution Data?
- Authors: Tang Li, Fengchun Qiao, Mengmeng Ma, Xi Peng
- Abstract summary: We propose an end-to-end model-agnostic learning framework Distributionally Robust Explanations (DRE)
Key idea is to fully utilize the inter-distribution information to provide supervisory signals for the learning of explanations without human annotation.
Our results demonstrate that the proposed method significantly improves the model's performance in terms of explanation and prediction robustness against distributional shifts.
- Score: 18.760475318852375
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As black-box models increasingly power high-stakes applications, a variety of
data-driven explanation methods have been introduced. Meanwhile, machine
learning models are constantly challenged by distributional shifts. A question
naturally arises: Are data-driven explanations robust against
out-of-distribution data? Our empirical results show that even though predict
correctly, the model might still yield unreliable explanations under
distributional shifts. How to develop robust explanations against
out-of-distribution data? To address this problem, we propose an end-to-end
model-agnostic learning framework Distributionally Robust Explanations (DRE).
The key idea is, inspired by self-supervised learning, to fully utilizes the
inter-distribution information to provide supervisory signals for the learning
of explanations without human annotation. Can robust explanations benefit the
model's generalization capability? We conduct extensive experiments on a wide
range of tasks and data types, including classification and regression on image
and scientific tabular data. Our results demonstrate that the proposed method
significantly improves the model's performance in terms of explanation and
prediction robustness against distributional shifts.
Related papers
- Ask Your Distribution Shift if Pre-Training is Right for You [74.18516460467019]
In practice, fine-tuning a pre-trained model improves robustness significantly in some cases but not at all in others.
We focus on two possible failure modes of models under distribution shift: poor extrapolation and biases in the training data.
Our study suggests that, as a rule of thumb, pre-training can help mitigate poor extrapolation but not dataset biases.
arXiv Detail & Related papers (2024-02-29T23:46:28Z) - Debiasing Multimodal Models via Causal Information Minimization [65.23982806840182]
We study bias arising from confounders in a causal graph for multimodal data.
Robust predictive features contain diverse information that helps a model generalize to out-of-distribution data.
We use these features as confounder representations and use them via methods motivated by causal theory to remove bias from models.
arXiv Detail & Related papers (2023-11-28T16:46:14Z) - Explanation Shift: How Did the Distribution Shift Impact the Model? [23.403838118256907]
We study how explanation characteristics shift when affected by distribution shifts.
We analyze different types of distribution shifts using synthetic examples and real-world data sets.
We release our methods in an open-source Python package, as well as the code used to reproduce our experiments.
arXiv Detail & Related papers (2023-03-14T17:13:01Z) - Discovering Invariant Rationales for Graph Neural Networks [104.61908788639052]
Intrinsic interpretability of graph neural networks (GNNs) is to find a small subset of the input graph's features.
We propose a new strategy of discovering invariant rationale (DIR) to construct intrinsically interpretable GNNs.
arXiv Detail & Related papers (2022-01-30T16:43:40Z) - Interpretable Data-Based Explanations for Fairness Debugging [7.266116143672294]
Gopher is a system that produces compact, interpretable, and causal explanations for bias or unexpected model behavior.
We introduce the concept of causal responsibility that quantifies the extent to which intervening on training data by removing or updating subsets of it can resolve the bias.
Building on this concept, we develop an efficient approach for generating the top-k patterns that explain model bias.
arXiv Detail & Related papers (2021-12-17T20:10:00Z) - Information-theoretic Evolution of Model Agnostic Global Explanations [10.921146104622972]
We present a novel model-agnostic approach that derives rules to globally explain the behavior of classification models trained on numerical and/or categorical data.
Our approach has been deployed in a leading digital marketing suite of products.
arXiv Detail & Related papers (2021-05-14T16:52:16Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Explainable Artificial Intelligence: How Subsets of the Training Data
Affect a Prediction [2.3204178451683264]
We propose a novel methodology which we call Shapley values for training data subset importance.
We show how the proposed explanations can be used to reveal biasedness in models and erroneous training data.
We argue that the explanations enable us to perceive more of the inner workings of the algorithms, and illustrate how models producing similar predictions can be based on very different parts of the training data.
arXiv Detail & Related papers (2020-12-07T12:15:47Z) - Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented.
It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts.
Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z) - Mind the Trade-off: Debiasing NLU Models without Degrading the
In-distribution Performance [70.31427277842239]
We introduce a novel debiasing method called confidence regularization.
It discourages models from exploiting biases while enabling them to receive enough incentive to learn from all the training examples.
We evaluate our method on three NLU tasks and show that, in contrast to its predecessors, it improves the performance on out-of-distribution datasets.
arXiv Detail & Related papers (2020-05-01T11:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.