Adversarial Infidelity Learning for Model Interpretation
- URL: http://arxiv.org/abs/2006.05379v3
- Date: Mon, 3 Aug 2020 02:41:50 GMT
- Title: Adversarial Infidelity Learning for Model Interpretation
- Authors: Jian Liang, Bing Bai, Yuren Cao, Kun Bai, Fei Wang
- Abstract summary: We propose a Model-agnostic Effective Efficient Direct (MEED) IFS framework for model interpretation.
Our framework mitigates concerns about sanity, shortcuts, model identifiability, and information transmission.
Our AIL mechanism can help learn the desired conditional distribution between selected features and targets.
- Score: 43.37354056251584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model interpretation is essential in data mining and knowledge discovery. It
can help understand the intrinsic model working mechanism and check if the
model has undesired characteristics. A popular way of performing model
interpretation is Instance-wise Feature Selection (IFS), which provides an
importance score of each feature representing the data samples to explain how
the model generates the specific output. In this paper, we propose a
Model-agnostic Effective Efficient Direct (MEED) IFS framework for model
interpretation, mitigating concerns about sanity, combinatorial shortcuts,
model identifiability, and information transmission. Also, we focus on the
following setting: using selected features to directly predict the output of
the given model, which serves as a primary evaluation metric for
model-interpretation methods. Apart from the features, we involve the output of
the given model as an additional input to learn an explainer based on more
accurate information. To learn the explainer, besides fidelity, we propose an
Adversarial Infidelity Learning (AIL) mechanism to boost the explanation
learning by screening relatively unimportant features. Through theoretical and
experimental analysis, we show that our AIL mechanism can help learn the
desired conditional distribution between selected features and targets.
Moreover, we extend our framework by integrating efficient interpretation
methods as proper priors to provide a warm start. Comprehensive empirical
evaluation results are provided by quantitative metrics and human evaluation to
demonstrate the effectiveness and superiority of our proposed method. Our code
is publicly available online at https://github.com/langlrsw/MEED.
Related papers
- Revisiting Demonstration Selection Strategies in In-Context Learning [66.11652803887284]
Large language models (LLMs) have shown an impressive ability to perform a wide range of tasks using in-context learning (ICL)
In this work, we first revisit the factors contributing to this variance from both data and model aspects, and find that the choice of demonstration is both data- and model-dependent.
We propose a data- and model-dependent demonstration selection method, textbfTopK + ConE, based on the assumption that textitthe performance of a demonstration positively correlates with its contribution to the model's understanding of the test samples.
arXiv Detail & Related papers (2024-01-22T16:25:27Z) - On the Foundations of Shortcut Learning [20.53986437152018]
We study how predictivity and availability interact to shape models' feature use.
We find that linear models are relatively unbiased, but introducing a single hidden layer with ReLU or Tanh units yields a bias.
arXiv Detail & Related papers (2023-10-24T22:54:05Z) - Evaluating Representations with Readout Model Switching [18.475866691786695]
In this paper, we propose to use the Minimum Description Length (MDL) principle to devise an evaluation metric.
We design a hybrid discrete and continuous-valued model space for the readout models and employ a switching strategy to combine their predictions.
The proposed metric can be efficiently computed with an online method and we present results for pre-trained vision encoders of various architectures.
arXiv Detail & Related papers (2023-02-19T14:08:01Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Distilling Interpretable Models into Human-Readable Code [71.11328360614479]
Human-readability is an important and desirable standard for machine-learned model interpretability.
We propose to train interpretable models using conventional methods, and then distill them into concise, human-readable code.
We describe a piecewise-linear curve-fitting algorithm that produces high-quality results efficiently and reliably across a broad range of use cases.
arXiv Detail & Related papers (2021-01-21T01:46:36Z) - A Framework to Learn with Interpretation [2.3741312212138896]
We present a novel framework to jointly learn a predictive model and its associated interpretation model.
We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers.
A detailed pipeline to visualize the learnt features is also developed.
arXiv Detail & Related papers (2020-10-19T09:26:28Z) - Better Model Selection with a new Definition of Feature Importance [8.914907178577476]
Feature importance aims at measuring how crucial each input feature is for model prediction.
In this paper, we propose a new tree-model explanation approach for model selection.
arXiv Detail & Related papers (2020-09-16T14:32:22Z) - ALEX: Active Learning based Enhancement of a Model's Explainability [34.26945469627691]
An active learning (AL) algorithm seeks to construct an effective classifier with a minimal number of labeled examples in a bootstrapping manner.
In the era of data-driven learning, this is an important research direction to pursue.
This paper describes our work-in-progress towards developing an AL selection function that in addition to model effectiveness also seeks to improve on the interpretability of a model during the bootstrapping steps.
arXiv Detail & Related papers (2020-09-02T07:15:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.