Related papers: Training Deep Models to be Explained with Fewer Examples

Training Deep Models to be Explained with Fewer Examples

URL: http://arxiv.org/abs/2112.03508v1
Date: Tue, 7 Dec 2021 05:39:21 GMT
Title: Training Deep Models to be Explained with Fewer Examples
Authors: Tomoharu Iwata and Yuya Yoshikawa
Abstract summary: We train prediction and explanation models simultaneously with a sparse regularizer for reducing the number of examples. Experiments using several datasets demonstrate that the proposed method improves faithfulness while keeping the predictive performance.
Score: 40.58343220792933
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although deep models achieve high predictive performance, it is difficult for humans to understand the predictions they made. Explainability is important for real-world applications to justify their reliability. Many example-based explanation methods have been proposed, such as representer point selection, where an explanation model defined by a set of training examples is used for explaining a prediction model. For improving the interpretability, reducing the number of examples in the explanation model is important. However, the explanations with fewer examples can be unfaithful since it is difficult to approximate prediction models well by such example-based explanation models. The unfaithful explanations mean that the predictions by the explainable model are different from those by the prediction model. We propose a method for training deep models such that their predictions are faithfully explained by explanation models with a small number of examples. We train the prediction and explanation models simultaneously with a sparse regularizer for reducing the number of examples. The proposed method can be incorporated into any neural network-based prediction models. Experiments using several datasets demonstrate that the proposed method improves faithfulness while keeping the predictive performance.

Related papers

Fast Explainability via Feasible Concept Sets Generator [7.011763596804071]
We bridge the gap between the universality of model-agnostic approaches and the efficiency of model-specific approaches. We first define explanations through a set of human-comprehensible concepts. Second, we show that a minimal feasible set generator can be learned as a companion explainer to the prediction model.
arXiv Detail & Related papers (2024-05-29T00:01:40Z)
What Will My Model Forget? Forecasting Forgotten Examples in Language Model Refinement [38.93348195407474]
Language models deployed in the wild make errors. Updating the model with the corrected error instances causes catastrophic forgetting. We propose a partially interpretable forecasting model based on the observation that changes in pre-softmax logit scores of pretraining examples resemble that of online learned examples.
arXiv Detail & Related papers (2024-02-02T19:43:15Z)
Pathologies of Pre-trained Language Models in Few-shot Fine-tuning [50.3686606679048]
We show that pre-trained language models with few examples show strong prediction bias across labels. Although few-shot fine-tuning can mitigate the prediction bias, our analysis shows models gain performance improvement by capturing non-task-related features. These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior.
arXiv Detail & Related papers (2022-04-17T15:55:18Z)
Counterfactual Explanations for Predictive Business Process Monitoring [0.90238471756546]
We propose LORELEY, a counterfactual explanation technique for predictive process monitoring. LORELEY can approximate prediction models with an average fidelity of 97.69% and generate realistic counterfactual explanations.
arXiv Detail & Related papers (2022-02-24T11:01:20Z)
Instance-Based Neural Dependency Parsing [56.63500180843504]
We develop neural models that possess an interpretable inference process for dependency parsing. Our models adopt instance-based inference, where dependency edges are extracted and labeled by comparing them to edges in a training set.
arXiv Detail & Related papers (2021-09-28T05:30:52Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
Explainable Artificial Intelligence: How Subsets of the Training Data Affect a Prediction [2.3204178451683264]
We propose a novel methodology which we call Shapley values for training data subset importance. We show how the proposed explanations can be used to reveal biasedness in models and erroneous training data. We argue that the explanations enable us to perceive more of the inner workings of the algorithms, and illustrate how models producing similar predictions can be based on very different parts of the training data.
arXiv Detail & Related papers (2020-12-07T12:15:47Z)
Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented. It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts. Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z)
Are Visual Explanations Useful? A Case Study in Model-in-the-Loop Prediction [49.254162397086006]
We study explanations based on visual saliency in an image-based age prediction task. We find that presenting model predictions improves human accuracy. However, explanations of various kinds fail to significantly alter human accuracy or trust in the model.
arXiv Detail & Related papers (2020-07-23T20:39:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.