Related papers: Evaluating Neuron Interpretation Methods of NLP Models

Evaluating Neuron Interpretation Methods of NLP Models

URL: http://arxiv.org/abs/2301.12608v2
Date: Sun, 5 Nov 2023 12:57:27 GMT
Title: Evaluating Neuron Interpretation Methods of NLP Models
Authors: Yimin Fan, Fahim Dalvi, Nadir Durrani, Hassan Sajjad
Abstract summary: We propose an evaluation framework that measures the compatibility of a neuron analysis method with other methods. We present a comparative analysis of a large set of neuron interpretation methods. It enables the evaluation of any new method using 20 concepts and across three pre-trained models.
Score: 28.71369775524347
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neuron Interpretation has gained traction in the field of interpretability, and have provided fine-grained insights into what a model learns and how language knowledge is distributed amongst its different components. However, the lack of evaluation benchmark and metrics have led to siloed progress within these various methods, with very little work comparing them and highlighting their strengths and weaknesses. The reason for this discrepancy is the difficulty of creating ground truth datasets, for example, many neurons within a given model may learn the same phenomena, and hence there may not be one correct answer. Moreover, a learned phenomenon may spread across several neurons that work together -- surfacing these to create a gold standard challenging. In this work, we propose an evaluation framework that measures the compatibility of a neuron analysis method with other methods. We hypothesize that the more compatible a method is with the majority of the methods, the more confident one can be about its performance. We systematically evaluate our proposed framework and present a comparative analysis of a large set of neuron interpretation methods. We make the evaluation framework available to the community. It enables the evaluation of any new method using 20 concepts and across three pre-trained models.The code is released at https://github.com/fdalvi/neuron-comparative-analysis

Related papers

Revisiting Large Language Model Pruning using Neuron Semantic Attribution [63.62836612864512]
We conduct evaluations on 24 datasets and 4 tasks using popular pruning methods. We surprisingly find a significant performance drop of existing pruning methods in sentiment classification tasks. We propose Neuron Semantic Attribution, which learns to associate each neuron with specific semantics.
arXiv Detail & Related papers (2025-03-03T13:52:17Z)
On the Value of Labeled Data and Symbolic Methods for Hidden Neuron Activation Analysis [1.55858752644861]
State of the art indicates that hidden node activations can, in some cases, be interpretable in a way that makes sense to humans. We introduce a novel model-agnostic post-hoc Explainable AI method demonstrating that it provides meaningful interpretations.
arXiv Detail & Related papers (2024-04-21T07:57:45Z)
Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process. We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z)
Adversarial Attacks on the Interpretation of Neuron Activation Maximization [70.5472799454224]
Activation-maximization approaches are used to interpret and analyze trained deep-learning models. In this work, we consider the concept of an adversary manipulating a model for the purpose of deceiving the interpretation.
arXiv Detail & Related papers (2023-06-12T19:54:33Z)
On Modifying a Neural Network's Perception [3.42658286826597]
We propose a method which allows one to modify what an artificial neural network is perceiving regarding specific human-defined concepts. We test the proposed method on different models, assessing whether the performed manipulations are well interpreted by the models, and analyzing how they react to them.
arXiv Detail & Related papers (2023-03-05T12:09:37Z)
Neural Causal Models for Counterfactual Identification and Estimation [62.30444687707919]
We study the evaluation of counterfactual statements through neural models. First, we show that neural causal models (NCMs) are expressive enough. Second, we develop an algorithm for simultaneously identifying and estimating counterfactual distributions.
arXiv Detail & Related papers (2022-09-30T18:29:09Z)
Interpreting Deep Learning Models in Natural Language Processing: A Review [33.80537635077772]
A long-standing criticism against neural network models is the lack of interpretability. In this survey, we provide a comprehensive review of various interpretation methods for neural models in NLP.
arXiv Detail & Related papers (2021-10-20T10:17:04Z)
Evaluating Saliency Methods for Neural Language Models [9.309351023703018]
Saliency methods are widely used to interpret neural network predictions. Different variants of saliency methods disagree even on the interpretations of the same prediction made by the same model. We conduct a comprehensive and quantitative evaluation of saliency methods on a fundamental category of NLP models: neural language models.
arXiv Detail & Related papers (2021-04-12T21:19:48Z)
The Neural Coding Framework for Learning Generative Models [91.0357317238509]
We propose a novel neural generative model inspired by the theory of predictive processing in the brain. In a similar way, artificial neurons in our generative model predict what neighboring neurons will do, and adjust their parameters based on how well the predictions matched reality.
arXiv Detail & Related papers (2020-12-07T01:20:38Z)
Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task. The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them. By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z)
Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts. We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)
Can you tell? SSNet -- a Sagittal Stratum-inspired Neural Network Framework for Sentiment Analysis [1.0312968200748118]
We propose a neural network architecture that combines predictions of different models on the same text to construct robust, accurate and computationally efficient classifiers for sentiment analysis. Among them, we propose a systematic new approach to combining multiple predictions based on a dedicated neural network and develop mathematical analysis of it along with state-of-the-art experimental results.
arXiv Detail & Related papers (2020-06-23T12:55:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.