Fine-grained Interpretation and Causation Analysis in Deep NLP Models
- URL: http://arxiv.org/abs/2105.08039v1
- Date: Mon, 17 May 2021 17:43:36 GMT
- Title: Fine-grained Interpretation and Causation Analysis in Deep NLP Models
- Authors: Hassan Sajjad, Narine Kokhlikyan, Fahim Dalvi, Nadir Durrani
- Abstract summary: We present and discuss the research work on interpreting fine-grained components of a model from two perspectives.
The former introduces methods to analyze individual neurons and a group of neurons with respect to a language property or a task.
The latter studies the role of neurons and input features in explaining decisions made by the model.
- Score: 20.425855491229147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper is a write-up for the tutorial on "Fine-grained Interpretation and
Causation Analysis in Deep NLP Models" that we are presenting at NAACL 2021. We
present and discuss the research work on interpreting fine-grained components
of a model from two perspectives, i) fine-grained interpretation, ii) causation
analysis. The former introduces methods to analyze individual neurons and a
group of neurons with respect to a language property or a task. The latter
studies the role of neurons and input features in explaining decisions made by
the model. We also discuss application of neuron analysis such as network
manipulation and domain adaptation. Moreover, we present two toolkits namely
NeuroX and Captum, that support functionalities discussed in this tutorial.
Related papers
- CausalGym: Benchmarking causal interpretability methods on linguistic
tasks [52.61917615039112]
We use CausalGym to benchmark the ability of interpretability methods to causally affect model behaviour.
We study the pythia models (14M--6.9B) and assess the causal efficacy of a wide range of interpretability methods.
We find that DAS outperforms the other methods, and so we use it to study the learning trajectory of two difficult linguistic phenomena.
arXiv Detail & Related papers (2024-02-19T21:35:56Z) - Towards Generating Informative Textual Description for Neurons in
Language Models [6.884227665279812]
We propose a framework that ties textual descriptions to neurons.
In particular, our experiment shows that the proposed approach achieves 75% precision@2, and 50% recall@2
arXiv Detail & Related papers (2024-01-30T04:06:25Z) - NeuroX Library for Neuron Analysis of Deep NLP Models [21.663464746974455]
We present NeuroX, a comprehensive open-source toolkit to conduct neuron analysis of natural language processing models.
NeuroX implements various interpretation methods under a unified API, and provides a framework for data processing and evaluation.
arXiv Detail & Related papers (2023-05-26T16:32:56Z) - N2G: A Scalable Approach for Quantifying Interpretable Neuron
Representations in Large Language Models [0.0]
N2G is a tool which takes a neuron and its dataset examples, and automatically distills the neuron's behaviour on those examples to an interpretable graph.
We use truncation and saliency methods to only present the important tokens, and augment the dataset examples with more diverse samples to better capture the extent of neuron behaviour.
These graphs can be visualised to aid manual interpretation by researchers, but can also output token activations on text to compare to the neuron's ground truth activations for automatic validation.
arXiv Detail & Related papers (2023-04-22T19:06:13Z) - Towards Faithful Model Explanation in NLP: A Survey [48.690624266879155]
End-to-end neural Natural Language Processing (NLP) models are notoriously difficult to understand.
One desideratum of model explanation is faithfulness, i.e. an explanation should accurately represent the reasoning process behind the model's prediction.
We review over 110 model explanation methods in NLP through the lens of faithfulness.
arXiv Detail & Related papers (2022-09-22T21:40:51Z) - Neural Language Models are not Born Equal to Fit Brain Data, but
Training Helps [75.84770193489639]
We examine the impact of test loss, training corpus and model architecture on the prediction of functional Magnetic Resonance Imaging timecourses of participants listening to an audiobook.
We find that untrained versions of each model already explain significant amount of signal in the brain by capturing similarity in brain responses across identical words.
We suggest good practices for future studies aiming at explaining the human language system using neural language models.
arXiv Detail & Related papers (2022-07-07T15:37:17Z) - Interpreting Deep Learning Models in Natural Language Processing: A
Review [33.80537635077772]
A long-standing criticism against neural network models is the lack of interpretability.
In this survey, we provide a comprehensive review of various interpretation methods for neural models in NLP.
arXiv Detail & Related papers (2021-10-20T10:17:04Z) - Neuron-level Interpretation of Deep NLP Models: A Survey [22.035813865470956]
A plethora of research has been carried out to analyze and understand components of the deep neural network models.
Recent work has concentrated on interpretability at a more granular level, analyzing neurons and groups of neurons in large models.
arXiv Detail & Related papers (2021-08-30T11:54:21Z) - Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts.
We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z) - Explaining Black Box Predictions and Unveiling Data Artifacts through
Influence Functions [55.660255727031725]
Influence functions explain the decisions of a model by identifying influential training examples.
We conduct a comparison between influence functions and common word-saliency methods on representative tasks.
We develop a new measure based on influence functions that can reveal artifacts in training data.
arXiv Detail & Related papers (2020-05-14T00:45:23Z) - Rethinking Generalization of Neural Models: A Named Entity Recognition
Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives.
Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models.
As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.