Evaluating Neuron Interpretation Methods of NLP Models
- URL: http://arxiv.org/abs/2301.12608v2
- Date: Sun, 5 Nov 2023 12:57:27 GMT
- Title: Evaluating Neuron Interpretation Methods of NLP Models
- Authors: Yimin Fan, Fahim Dalvi, Nadir Durrani, Hassan Sajjad
- Abstract summary: We propose an evaluation framework that measures the compatibility of a neuron analysis method with other methods.
We present a comparative analysis of a large set of neuron interpretation methods.
It enables the evaluation of any new method using 20 concepts and across three pre-trained models.
- Score: 28.71369775524347
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neuron Interpretation has gained traction in the field of interpretability,
and have provided fine-grained insights into what a model learns and how
language knowledge is distributed amongst its different components. However,
the lack of evaluation benchmark and metrics have led to siloed progress within
these various methods, with very little work comparing them and highlighting
their strengths and weaknesses. The reason for this discrepancy is the
difficulty of creating ground truth datasets, for example, many neurons within
a given model may learn the same phenomena, and hence there may not be one
correct answer. Moreover, a learned phenomenon may spread across several
neurons that work together -- surfacing these to create a gold standard
challenging. In this work, we propose an evaluation framework that measures the
compatibility of a neuron analysis method with other methods. We hypothesize
that the more compatible a method is with the majority of the methods, the more
confident one can be about its performance. We systematically evaluate our
proposed framework and present a comparative analysis of a large set of neuron
interpretation methods. We make the evaluation framework available to the
community. It enables the evaluation of any new method using 20 concepts and
across three pre-trained models.The code is released at
https://github.com/fdalvi/neuron-comparative-analysis
Related papers
- On the Value of Labeled Data and Symbolic Methods for Hidden Neuron Activation Analysis [1.55858752644861]
State of the art indicates that hidden node activations can, in some cases, be interpretable in a way that makes sense to humans.
We introduce a novel model-agnostic post-hoc Explainable AI method demonstrating that it provides meaningful interpretations.
arXiv Detail & Related papers (2024-04-21T07:57:45Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Adversarial Attacks on the Interpretation of Neuron Activation
Maximization [70.5472799454224]
Activation-maximization approaches are used to interpret and analyze trained deep-learning models.
In this work, we consider the concept of an adversary manipulating a model for the purpose of deceiving the interpretation.
arXiv Detail & Related papers (2023-06-12T19:54:33Z) - On Modifying a Neural Network's Perception [3.42658286826597]
We propose a method which allows one to modify what an artificial neural network is perceiving regarding specific human-defined concepts.
We test the proposed method on different models, assessing whether the performed manipulations are well interpreted by the models, and analyzing how they react to them.
arXiv Detail & Related papers (2023-03-05T12:09:37Z) - Neural Causal Models for Counterfactual Identification and Estimation [62.30444687707919]
We study the evaluation of counterfactual statements through neural models.
First, we show that neural causal models (NCMs) are expressive enough.
Second, we develop an algorithm for simultaneously identifying and estimating counterfactual distributions.
arXiv Detail & Related papers (2022-09-30T18:29:09Z) - Interpreting Deep Learning Models in Natural Language Processing: A
Review [33.80537635077772]
A long-standing criticism against neural network models is the lack of interpretability.
In this survey, we provide a comprehensive review of various interpretation methods for neural models in NLP.
arXiv Detail & Related papers (2021-10-20T10:17:04Z) - Evaluating Saliency Methods for Neural Language Models [9.309351023703018]
Saliency methods are widely used to interpret neural network predictions.
Different variants of saliency methods disagree even on the interpretations of the same prediction made by the same model.
We conduct a comprehensive and quantitative evaluation of saliency methods on a fundamental category of NLP models: neural language models.
arXiv Detail & Related papers (2021-04-12T21:19:48Z) - The Neural Coding Framework for Learning Generative Models [91.0357317238509]
We propose a novel neural generative model inspired by the theory of predictive processing in the brain.
In a similar way, artificial neurons in our generative model predict what neighboring neurons will do, and adjust their parameters based on how well the predictions matched reality.
arXiv Detail & Related papers (2020-12-07T01:20:38Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts.
We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z) - Can you tell? SSNet -- a Sagittal Stratum-inspired Neural Network
Framework for Sentiment Analysis [1.0312968200748118]
We propose a neural network architecture that combines predictions of different models on the same text to construct robust, accurate and computationally efficient classifiers for sentiment analysis.
Among them, we propose a systematic new approach to combining multiple predictions based on a dedicated neural network and develop mathematical analysis of it along with state-of-the-art experimental results.
arXiv Detail & Related papers (2020-06-23T12:55:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.