Understanding Instance-based Interpretability of Variational
Auto-Encoders
- URL: http://arxiv.org/abs/2105.14203v1
- Date: Sat, 29 May 2021 04:03:09 GMT
- Title: Understanding Instance-based Interpretability of Variational
Auto-Encoders
- Authors: Zhifeng Kong, Kamalika Chaudhuri
- Abstract summary: We investigate influence functions for a class of deep generative models called variational auto-encoders (VAE)
We then introduce VAE-TracIn, a computationally efficient and theoretically sound solution based on Pruthi et al.
We evaluate VAE-TracIn on several real world datasets with extensive quantitative and qualitative analysis.
- Score: 24.493721984271566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instance-based interpretation methods have been widely studied for supervised
learning methods as they help explain how black box neural networks predict.
However, instance-based interpretations remain ill-understood in the context of
unsupervised learning. In this paper, we investigate influence functions [20],
a popular instance-based interpretation method, for a class of deep generative
models called variational auto-encoders (VAE). We formally frame the
counter-factual question answered by influence functions in this setting, and
through theoretical analysis, examine what they reveal about the impact of
training samples on classical unsupervised learning methods. We then introduce
VAE-TracIn, a computationally efficient and theoretically sound solution based
on Pruthi et al. [28], for VAEs. Finally, we evaluate VAE-TracIn on several
real world datasets with extensive quantitative and qualitative analysis.
Related papers
- How to Probe: Simple Yet Effective Techniques for Improving Post-hoc Explanations [69.72654127617058]
Post-hoc importance attribution methods are a popular tool for "explaining" Deep Neural Networks (DNNs)
In this work we bring forward empirical evidence that challenges this very notion.
We discover a strong dependency on and demonstrate that the training details of a pre-trained model's classification layer play a crucial role.
arXiv Detail & Related papers (2025-03-01T22:25:11Z) - Towards Understanding the Influence of Training Samples on Explanations [5.695152528716705]
Explainable AI (XAI) is widely used to analyze AI systems' decision-making.
When unexpected explanations occur, users may want to understand the training data properties shaping them.
Under the umbrella of data valuation, first approaches have been proposed that estimate the influence of data samples on a given model.
arXiv Detail & Related papers (2024-06-05T07:20:06Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - Counterfactuals of Counterfactuals: a back-translation-inspired approach
to analyse counterfactual editors [3.4253416336476246]
We focus on the analysis of counterfactual, contrastive explanations.
We propose a new back translation-inspired evaluation methodology.
We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models.
arXiv Detail & Related papers (2023-05-26T16:04:28Z) - An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system.
Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches.
This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z) - Influence Tuning: Demoting Spurious Correlations via Instance
Attribution and Instance-Driven Updates [26.527311287924995]
influence tuning can help deconfounding the model from spurious patterns in data.
We show that in a controlled setup, influence tuning can help deconfounding the model from spurious patterns in data.
arXiv Detail & Related papers (2021-10-07T06:59:46Z) - Exploring Adversarial Examples for Efficient Active Learning in Machine
Learning Classifiers [17.90617023533039]
We first add particular perturbation to original training examples using adversarial attack methods.
We then investigate the connections between active learning and these particular training examples.
Results show that the established theoretical foundation will guide better active learning strategies based on adversarial examples.
arXiv Detail & Related papers (2021-09-22T14:51:26Z) - Interpretable Deep Learning: Interpretations, Interpretability,
Trustworthiness, and Beyond [49.93153180169685]
We introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused.
We elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy.
We summarize the existing work in evaluating models' interpretability using "trustworthy" interpretation algorithms.
arXiv Detail & Related papers (2021-03-19T08:40:30Z) - Efficient Estimation of Influence of a Training Instance [56.29080605123304]
We propose an efficient method for estimating the influence of a training instance on a neural network model.
Our method is inspired by dropout, which zero-masks a sub-network and prevents the sub-network from learning each training instance.
We demonstrate that the proposed method can capture training influences, enhance the interpretability of error predictions, and cleanse the training dataset for improving generalization.
arXiv Detail & Related papers (2020-12-08T04:31:38Z) - Understanding Interpretability by generalized distillation in Supervised
Classification [3.5473853445215897]
Recent interpretation strategies focus on human understanding of the underlying decision mechanisms of the complex Machine Learning models.
We propose an interpretation-by-distillation formulation that is defined relative to other ML models.
We evaluate our proposed framework on the MNIST, Fashion-MNIST and Stanford40 datasets.
arXiv Detail & Related papers (2020-12-05T17:42:50Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Explaining Black Box Predictions and Unveiling Data Artifacts through
Influence Functions [55.660255727031725]
Influence functions explain the decisions of a model by identifying influential training examples.
We conduct a comparison between influence functions and common word-saliency methods on representative tasks.
We develop a new measure based on influence functions that can reveal artifacts in training data.
arXiv Detail & Related papers (2020-05-14T00:45:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.