An Investigation of Language Model Interpretability via Sentence Editing
- URL: http://arxiv.org/abs/2011.14039v2
- Date: Sun, 26 Sep 2021 18:36:37 GMT
- Title: An Investigation of Language Model Interpretability via Sentence Editing
- Authors: Samuel Stevens and Yu Su
- Abstract summary: We re-purpose a sentence editing dataset as a testbed for interpretability of pre-trained language models (PLMs)
This enables us to conduct a systematic investigation on an array of questions regarding PLMs' interpretability.
The investigation generates new insights, for example, contrary to the common understanding, we find that attention weights correlate well with human rationales.
- Score: 5.492504126672887
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-trained language models (PLMs) like BERT are being used for almost all
language-related tasks, but interpreting their behavior still remains a
significant challenge and many important questions remain largely unanswered.
In this work, we re-purpose a sentence editing dataset, where faithful
high-quality human rationales can be automatically extracted and compared with
extracted model rationales, as a new testbed for interpretability. This enables
us to conduct a systematic investigation on an array of questions regarding
PLMs' interpretability, including the role of pre-training procedure,
comparison of rationale extraction methods, and different layers in the PLM.
The investigation generates new insights, for example, contrary to the common
understanding, we find that attention weights correlate well with human
rationales and work better than gradient-based saliency in extracting model
rationales. Both the dataset and code are available at
https://github.com/samuelstevens/sentence-editing-interpretability to
facilitate future interpretability research.
Related papers
- Crafting Interpretable Embeddings by Asking LLMs Questions [89.49960984640363]
Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks.
We introduce question-answering embeddings (QA-Emb), embeddings where each feature represents an answer to a yes/no question asked to an LLM.
We use QA-Emb to flexibly generate interpretable models for predicting fMRI voxel responses to language stimuli.
arXiv Detail & Related papers (2024-05-26T22:30:29Z) - Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Topics in the Haystack: Extracting and Evaluating Topics beyond
Coherence [0.0]
We propose a method that incorporates a deeper understanding of both sentence and document themes.
This allows our model to detect latent topics that may include uncommon words or neologisms.
We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task.
arXiv Detail & Related papers (2023-03-30T12:24:25Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational
Finance Question Answering [70.6359636116848]
We propose a new large-scale dataset, ConvFinQA, to study the chain of numerical reasoning in conversational question answering.
Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations.
arXiv Detail & Related papers (2022-10-07T23:48:50Z) - Interpretable Research Replication Prediction via Variational Contextual
Consistency Sentence Masking [14.50690911709558]
Research Replication Prediction (RRP) is the task of predicting whether a published research result can be replicated or not.
In this work, we propose the Variational Contextual Consistency Sentence Masking (VCCSM) method to automatically extract key sentences.
Results of our experiments on RRP along with European Convention of Human Rights (ECHR) datasets demonstrate that VCCSM is able to improve the model interpretability for the long document classification tasks.
arXiv Detail & Related papers (2022-03-28T03:27:13Z) - A Framework to Learn with Interpretation [2.3741312212138896]
We present a novel framework to jointly learn a predictive model and its associated interpretation model.
We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers.
A detailed pipeline to visualize the learnt features is also developed.
arXiv Detail & Related papers (2020-10-19T09:26:28Z) - Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting
BERT [29.04485839262945]
We propose a parameter-free probing technique for analyzing pre-trained language models (e.g., BERT)
Our method does not require direct supervision from the probing tasks, nor do we introduce additional parameters to the probing process.
Our experiments on BERT show that syntactic trees recovered from BERT using our method are significantly better than linguistically-uninformed baselines.
arXiv Detail & Related papers (2020-04-30T14:02:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.