Related papers: An Investigation of Language Model Interpretability via Sentence Editing

An Investigation of Language Model Interpretability via Sentence Editing

URL: http://arxiv.org/abs/2011.14039v2
Date: Sun, 26 Sep 2021 18:36:37 GMT
Title: An Investigation of Language Model Interpretability via Sentence Editing
Authors: Samuel Stevens and Yu Su
Abstract summary: We re-purpose a sentence editing dataset as a testbed for interpretability of pre-trained language models (PLMs) This enables us to conduct a systematic investigation on an array of questions regarding PLMs' interpretability. The investigation generates new insights, for example, contrary to the common understanding, we find that attention weights correlate well with human rationales.
Score: 5.492504126672887
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pre-trained language models (PLMs) like BERT are being used for almost all language-related tasks, but interpreting their behavior still remains a significant challenge and many important questions remain largely unanswered. In this work, we re-purpose a sentence editing dataset, where faithful high-quality human rationales can be automatically extracted and compared with extracted model rationales, as a new testbed for interpretability. This enables us to conduct a systematic investigation on an array of questions regarding PLMs' interpretability, including the role of pre-training procedure, comparison of rationale extraction methods, and different layers in the PLM. The investigation generates new insights, for example, contrary to the common understanding, we find that attention weights correlate well with human rationales and work better than gradient-based saliency in extracting model rationales. Both the dataset and code are available at https://github.com/samuelstevens/sentence-editing-interpretability to facilitate future interpretability research.

Related papers

Can LLM-Generated Textual Explanations Enhance Model Classification Performance? An Empirical Study [11.117380681219295]
We present an automated framework to generate high-quality textual explanations.<n>We rigorously assess the quality of these explanations using a comprehensive suite of Natural Language Generation (NLG) metrics.<n>Our experiments demonstrate that automated explanations exhibit highly competitive effectiveness compared to human-annotated explanations.
arXiv Detail & Related papers (2025-08-13T12:59:08Z)
Crafting Interpretable Embeddings by Asking LLMs Questions [89.49960984640363]
Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks. We introduce question-answering embeddings (QA-Emb), embeddings where each feature represents an answer to a yes/no question asked to an LLM. We use QA-Emb to flexibly generate interpretable models for predicting fMRI voxel responses to language stimuli.
arXiv Detail & Related papers (2024-05-26T22:30:29Z)
Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers. We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z)
Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges. Our model is trained on user queries and LLM-generated responses under massive real-world scenarios. Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z)
Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions. This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z)
Topics in the Haystack: Extracting and Evaluating Topics beyond Coherence [0.0]
We propose a method that incorporates a deeper understanding of both sentence and document themes. This allows our model to detect latent topics that may include uncommon words or neologisms. We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task.
arXiv Detail & Related papers (2023-03-30T12:24:25Z)
Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z)
ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering [70.6359636116848]
We propose a new large-scale dataset, ConvFinQA, to study the chain of numerical reasoning in conversational question answering. Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations.
arXiv Detail & Related papers (2022-10-07T23:48:50Z)
Interpretable Research Replication Prediction via Variational Contextual Consistency Sentence Masking [14.50690911709558]
Research Replication Prediction (RRP) is the task of predicting whether a published research result can be replicated or not. In this work, we propose the Variational Contextual Consistency Sentence Masking (VCCSM) method to automatically extract key sentences. Results of our experiments on RRP along with European Convention of Human Rights (ECHR) datasets demonstrate that VCCSM is able to improve the model interpretability for the long document classification tasks.
arXiv Detail & Related papers (2022-03-28T03:27:13Z)
A Framework to Learn with Interpretation [2.3741312212138896]
We present a novel framework to jointly learn a predictive model and its associated interpretation model. We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers. A detailed pipeline to visualize the learnt features is also developed.
arXiv Detail & Related papers (2020-10-19T09:26:28Z)
Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT [29.04485839262945]
We propose a parameter-free probing technique for analyzing pre-trained language models (e.g., BERT) Our method does not require direct supervision from the probing tasks, nor do we introduce additional parameters to the probing process. Our experiments on BERT show that syntactic trees recovered from BERT using our method are significantly better than linguistically-uninformed baselines.
arXiv Detail & Related papers (2020-04-30T14:02:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.