Related papers: Locally Aggregated Feature Attribution on Natural Language Model Understanding

Locally Aggregated Feature Attribution on Natural Language Model Understanding

URL: http://arxiv.org/abs/2204.10893v2
Date: Tue, 26 Apr 2022 01:09:35 GMT
Title: Locally Aggregated Feature Attribution on Natural Language Model Understanding
Authors: Sheng Zhang, Jin Wang, Haitao Jiang, Rui Song
Abstract summary: Locally Aggregated Feature Attribution (LAFA) is a novel gradient-based feature attribution method for NLP models. Instead of relying on obscure reference tokens, it smooths gradients by aggregating similar reference texts derived from language model embeddings. For evaluation purpose, we also design experiments on different NLP tasks including Entity Recognition and Sentiment Analysis on public datasets.
Score: 12.233103741197334
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the growing popularity of deep-learning models, model understanding becomes more important. Much effort has been devoted to demystify deep neural networks for better interpretability. Some feature attribution methods have shown promising results in computer vision, especially the gradient-based methods where effectively smoothing the gradients with reference data is key to a robust and faithful result. However, direct application of these gradient-based methods to NLP tasks is not trivial due to the fact that the input consists of discrete tokens and the "reference" tokens are not explicitly defined. In this work, we propose Locally Aggregated Feature Attribution (LAFA), a novel gradient-based feature attribution method for NLP models. Instead of relying on obscure reference tokens, it smooths gradients by aggregating similar reference texts derived from language model embeddings. For evaluation purpose, we also design experiments on different NLP tasks including Entity Recognition and Sentiment Analysis on public datasets as well as key feature detection on a constructed Amazon catalogue dataset. The superior performance of the proposed method is demonstrated through experiments.

Related papers

Uniform Discretized Integrated Gradients: An effective attribution based method for explaining large language models [0.0]
Integrated Gradients is a well-known technique for explaining deep learning models. In this paper, we propose a method called Uniform Discretized Integrated Gradients (UDIG) We evaluate our method on two types of NLP tasks- Sentiment Classification and Question Answering against three metrics viz Log odds, Comprehensiveness and Sufficiency.
arXiv Detail & Related papers (2024-12-05T05:39:03Z)
Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels. By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data. The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z)
Influence Scores at Scale for Efficient Language Data Sampling [3.072340427031969]
"influence scores" are used to identify important subsets of data. In this paper, we explore the applicability of influence scores in language classification tasks.
arXiv Detail & Related papers (2023-11-27T20:19:22Z)
Generalizing Backpropagation for Gradient-Based Interpretability [103.2998254573497]
We show that the gradient of a model is a special case of a more general formulation using semirings. This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics.
arXiv Detail & Related papers (2023-07-06T15:19:53Z)
Tell Model Where to Attend: Improving Interpretability of Aspect-Based Sentiment Classification via Small Explanation Annotations [23.05672636220897]
We propose an textbfInterpretation-textbfEnhanced textbfGradient-based framework for textbfABSC via a small number of explanation annotations, namely textttIEGA. Our model is model agnostic and task agnostic so that it can be integrated into the existing ABSC methods or other tasks.
arXiv Detail & Related papers (2023-02-21T06:55:08Z)
Modeling Multi-Granularity Hierarchical Features for Relation Extraction [26.852869800344813]
We propose a novel method to extract multi-granularity features based solely on the original input sentences. We show that effective structured features can be attained even without external knowledge.
arXiv Detail & Related papers (2022-04-09T09:44:05Z)
Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts. We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data. We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z)
Visualization of Supervised and Self-Supervised Neural Networks via Attribution Guided Factorization [87.96102461221415]
We develop an algorithm that provides per-class explainability. In an extensive battery of experiments, we demonstrate the ability of our methods to class-specific visualization.
arXiv Detail & Related papers (2020-12-03T18:48:39Z)
Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task. The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them. By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z)
Gradient-based Analysis of NLP Models is Manipulable [44.215057692679494]
We demonstrate that the gradients of a model are easily manipulable, and thus bring into question the reliability of gradient-based analyses. In particular, we merge the layers of a target model with a Facade that overwhelms the gradients without affecting the predictions.
arXiv Detail & Related papers (2020-10-12T02:54:22Z)
SEKD: Self-Evolving Keypoint Detection and Description [42.114065439674036]
We propose a self-supervised framework to learn an advanced local feature model from unlabeled natural images. We benchmark the proposed method on homography estimation, relative pose estimation, and structure-from-motion tasks. We will release our code along with the trained model publicly.
arXiv Detail & Related papers (2020-06-09T06:56:50Z)
Coreferential Reasoning Learning for Language Representation [88.14248323659267]
We present CorefBERT, a novel language representation model that can capture the coreferential relations in context. The experimental results show that, compared with existing baseline models, CorefBERT can achieve significant improvements consistently on various downstream NLP tasks.
arXiv Detail & Related papers (2020-04-15T03:57:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.