Explaining Black Box Predictions and Unveiling Data Artifacts through
Influence Functions
- URL: http://arxiv.org/abs/2005.06676v1
- Date: Thu, 14 May 2020 00:45:23 GMT
- Title: Explaining Black Box Predictions and Unveiling Data Artifacts through
Influence Functions
- Authors: Xiaochuang Han, Byron C. Wallace, Yulia Tsvetkov
- Abstract summary: Influence functions explain the decisions of a model by identifying influential training examples.
We conduct a comparison between influence functions and common word-saliency methods on representative tasks.
We develop a new measure based on influence functions that can reveal artifacts in training data.
- Score: 55.660255727031725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern deep learning models for NLP are notoriously opaque. This has
motivated the development of methods for interpreting such models, e.g., via
gradient-based saliency maps or the visualization of attention weights. Such
approaches aim to provide explanations for a particular model prediction by
highlighting important words in the corresponding input text. While this might
be useful for tasks where decisions are explicitly influenced by individual
tokens in the input, we suspect that such highlighting is not suitable for
tasks where model decisions should be driven by more complex reasoning. In this
work, we investigate the use of influence functions for NLP, providing an
alternative approach to interpreting neural text classifiers. Influence
functions explain the decisions of a model by identifying influential training
examples. Despite the promise of this approach, influence functions have not
yet been extensively evaluated in the context of NLP, a gap addressed by this
work. We conduct a comparison between influence functions and common
word-saliency methods on representative tasks. As suspected, we find that
influence functions are particularly useful for natural language inference, a
task in which 'saliency maps' may not have clear interpretation. Furthermore,
we develop a new quantitative measure based on influence functions that can
reveal artifacts in training data.
Related papers
- Influence Functions for Scalable Data Attribution in Diffusion Models [52.92223039302037]
Diffusion models have led to significant advancements in generative modelling.
Yet their widespread adoption poses challenges regarding data attribution and interpretability.
In this paper, we aim to help address such challenges by developing an textitinfluence functions framework.
arXiv Detail & Related papers (2024-10-17T17:59:02Z) - Do Influence Functions Work on Large Language Models? [10.463762448166714]
Influence functions aim to quantify the impact of individual training data points on a model's predictions.
We evaluate influence functions across multiple tasks and find that they consistently perform poorly in most settings.
arXiv Detail & Related papers (2024-09-30T06:50:18Z) - Studying Large Language Model Generalization with Influence Functions [29.577692176892135]
Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a sequence were added to the training set?
We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to large language models (LLMs) with up to 52 billion parameters.
We investigate generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior.
arXiv Detail & Related papers (2023-08-07T04:47:42Z) - If Influence Functions are the Answer, Then What is the Question? [7.873458431535409]
Influence functions efficiently estimate the effect of removing a single training data point on a model's learned parameters.
While influence estimates align well with leave-one-out retraining for linear models, recent works have shown this alignment is often poor in neural networks.
arXiv Detail & Related papers (2022-09-12T16:17:43Z) - A Functional Information Perspective on Model Interpretation [30.101107406343665]
This work suggests a theoretical framework for model interpretability.
We rely on the log-Sobolev inequality that bounds the functional entropy by the functional Fisher information.
We show that our method surpasses existing interpretability sampling-based methods on various data signals.
arXiv Detail & Related papers (2022-06-12T09:24:45Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Influence Tuning: Demoting Spurious Correlations via Instance
Attribution and Instance-Driven Updates [26.527311287924995]
influence tuning can help deconfounding the model from spurious patterns in data.
We show that in a controlled setup, influence tuning can help deconfounding the model from spurious patterns in data.
arXiv Detail & Related papers (2021-10-07T06:59:46Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Influence Functions in Deep Learning Are Fragile [52.31375893260445]
influence functions approximate the effect of samples in test-time predictions.
influence estimates are fairly accurate for shallow networks.
Hessian regularization is important to get highquality influence estimates.
arXiv Detail & Related papers (2020-06-25T18:25:59Z) - Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm.
Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function.
We develop an approach for representation learning in RL that sits in between these two extremes.
This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.