Locally Aggregated Feature Attribution on Natural Language Model
Understanding
- URL: http://arxiv.org/abs/2204.10893v2
- Date: Tue, 26 Apr 2022 01:09:35 GMT
- Title: Locally Aggregated Feature Attribution on Natural Language Model
Understanding
- Authors: Sheng Zhang, Jin Wang, Haitao Jiang, Rui Song
- Abstract summary: Locally Aggregated Feature Attribution (LAFA) is a novel gradient-based feature attribution method for NLP models.
Instead of relying on obscure reference tokens, it smooths gradients by aggregating similar reference texts derived from language model embeddings.
For evaluation purpose, we also design experiments on different NLP tasks including Entity Recognition and Sentiment Analysis on public datasets.
- Score: 12.233103741197334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the growing popularity of deep-learning models, model understanding
becomes more important. Much effort has been devoted to demystify deep neural
networks for better interpretability. Some feature attribution methods have
shown promising results in computer vision, especially the gradient-based
methods where effectively smoothing the gradients with reference data is key to
a robust and faithful result. However, direct application of these
gradient-based methods to NLP tasks is not trivial due to the fact that the
input consists of discrete tokens and the "reference" tokens are not explicitly
defined. In this work, we propose Locally Aggregated Feature Attribution
(LAFA), a novel gradient-based feature attribution method for NLP models.
Instead of relying on obscure reference tokens, it smooths gradients by
aggregating similar reference texts derived from language model embeddings. For
evaluation purpose, we also design experiments on different NLP tasks including
Entity Recognition and Sentiment Analysis on public datasets as well as key
feature detection on a constructed Amazon catalogue dataset. The superior
performance of the proposed method is demonstrated through experiments.
Related papers
- Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - Influence Scores at Scale for Efficient Language Data Sampling [3.072340427031969]
"influence scores" are used to identify important subsets of data.
In this paper, we explore the applicability of influence scores in language classification tasks.
arXiv Detail & Related papers (2023-11-27T20:19:22Z) - Generalizing Backpropagation for Gradient-Based Interpretability [103.2998254573497]
We show that the gradient of a model is a special case of a more general formulation using semirings.
This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics.
arXiv Detail & Related papers (2023-07-06T15:19:53Z) - Tell Model Where to Attend: Improving Interpretability of Aspect-Based
Sentiment Classification via Small Explanation Annotations [23.05672636220897]
We propose an textbfInterpretation-textbfEnhanced textbfGradient-based framework for textbfABSC via a small number of explanation annotations, namely textttIEGA.
Our model is model agnostic and task agnostic so that it can be integrated into the existing ABSC methods or other tasks.
arXiv Detail & Related papers (2023-02-21T06:55:08Z) - Modeling Multi-Granularity Hierarchical Features for Relation Extraction [26.852869800344813]
We propose a novel method to extract multi-granularity features based solely on the original input sentences.
We show that effective structured features can be attained even without external knowledge.
arXiv Detail & Related papers (2022-04-09T09:44:05Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - Visualization of Supervised and Self-Supervised Neural Networks via
Attribution Guided Factorization [87.96102461221415]
We develop an algorithm that provides per-class explainability.
In an extensive battery of experiments, we demonstrate the ability of our methods to class-specific visualization.
arXiv Detail & Related papers (2020-12-03T18:48:39Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Gradient-based Analysis of NLP Models is Manipulable [44.215057692679494]
We demonstrate that the gradients of a model are easily manipulable, and thus bring into question the reliability of gradient-based analyses.
In particular, we merge the layers of a target model with a Facade that overwhelms the gradients without affecting the predictions.
arXiv Detail & Related papers (2020-10-12T02:54:22Z) - SEKD: Self-Evolving Keypoint Detection and Description [42.114065439674036]
We propose a self-supervised framework to learn an advanced local feature model from unlabeled natural images.
We benchmark the proposed method on homography estimation, relative pose estimation, and structure-from-motion tasks.
We will release our code along with the trained model publicly.
arXiv Detail & Related papers (2020-06-09T06:56:50Z) - Coreferential Reasoning Learning for Language Representation [88.14248323659267]
We present CorefBERT, a novel language representation model that can capture the coreferential relations in context.
The experimental results show that, compared with existing baseline models, CorefBERT can achieve significant improvements consistently on various downstream NLP tasks.
arXiv Detail & Related papers (2020-04-15T03:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.