An Interpretability Illusion for BERT
- URL: http://arxiv.org/abs/2104.07143v1
- Date: Wed, 14 Apr 2021 22:04:48 GMT
- Title: An Interpretability Illusion for BERT
- Authors: Tolga Bolukbasi, Adam Pearce, Ann Yuan, Andy Coenen, Emily Reif,
Fernanda Vi\'egas, Martin Wattenberg
- Abstract summary: We describe an "interpretability illusion" that arises when analyzing the BERT model.
We trace the source of this illusion to geometric properties of BERT's embedding space.
We provide a taxonomy of model-learned concepts and discuss methodological implications for interpretability research.
- Score: 61.2687465308121
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We describe an "interpretability illusion" that arises when analyzing the
BERT model. Activations of individual neurons in the network may spuriously
appear to encode a single, simple concept, when in fact they are encoding
something far more complex. The same effect holds for linear combinations of
activations. We trace the source of this illusion to geometric properties of
BERT's embedding space as well as the fact that common text corpora represent
only narrow slices of possible English sentences. We provide a taxonomy of
model-learned concepts and discuss methodological implications for
interpretability research, especially the importance of testing hypotheses on
multiple data sets.
Related papers
- Disentangling Dense Embeddings with Sparse Autoencoders [0.0]
Sparse autoencoders (SAEs) have shown promise in extracting interpretable features from complex neural networks.
We present one of the first applications of SAEs to dense text embeddings from large language models.
We show that the resulting sparse representations maintain semantic fidelity while offering interpretability.
arXiv Detail & Related papers (2024-08-01T15:46:22Z) - On the Joint Interaction of Models, Data, and Features [82.60073661644435]
We introduce a new tool, the interaction tensor, for empirically analyzing the interaction between data and model through features.
Based on these observations, we propose a conceptual framework for feature learning.
Under this framework, the expected accuracy for a single hypothesis and agreement for a pair of hypotheses can both be derived in closed-form.
arXiv Detail & Related papers (2023-06-07T21:35:26Z) - All Roads Lead to Rome? Exploring the Invariance of Transformers'
Representations [69.3461199976959]
We propose a model based on invertible neural networks, BERT-INN, to learn the Bijection Hypothesis.
We show the advantage of BERT-INN both theoretically and through extensive experiments.
arXiv Detail & Related papers (2023-05-23T22:30:43Z) - Semantic interpretation for convolutional neural networks: What makes a
cat a cat? [3.132595571344153]
We introduce the framework of semantic explainable AI (S-XAI)
S-XAI uses row-centered principal component analysis to obtain the common traits from the best combination of superpixels discovered by a genetic algorithm.
It extracts understandable semantic spaces on the basis of discovered semantically sensitive neurons and visualization techniques.
arXiv Detail & Related papers (2022-04-16T05:25:17Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - The Low-Dimensional Linear Geometry of Contextualized Word
Representations [27.50785941238007]
We study the linear geometry of contextualized word representations in ELMO and BERT.
We show that a variety of linguistic features are encoded in low-dimensional subspaces.
arXiv Detail & Related papers (2021-05-15T00:58:08Z) - Exploring the Role of BERT Token Representations to Explain Sentence
Probing Results [15.652077779677091]
We show that BERT tends to encode meaningful knowledge in specific token representations.
This allows the model to detect syntactic and semantic abnormalities and to distinctively separate grammatical number and tense subspaces.
arXiv Detail & Related papers (2021-04-03T20:40:42Z) - On the Sentence Embeddings from Pre-trained Language Models [78.45172445684126]
In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited.
We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity.
We propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective.
arXiv Detail & Related papers (2020-11-02T13:14:57Z) - Intrinsic Probing through Dimension Selection [69.52439198455438]
Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks.
Such high performance should not be possible unless some form of linguistic structure inheres in these representations, and a wealth of research has sprung up on probing for it.
In this paper, we draw a distinction between intrinsic probing, which examines how linguistic information is structured within a representation, and the extrinsic probing popular in prior work, which only argues for the presence of such information by showing that it can be successfully extracted.
arXiv Detail & Related papers (2020-10-06T15:21:08Z) - INFOTABS: Inference on Tables as Semi-structured Data [39.84930221015755]
We introduce a new dataset called INFOTABS, comprising of human-written textual hypotheses based on premises that are tables extracted from Wikipedia info-boxes.
Our analysis shows that the semi-structured, multi-domain and heterogeneous nature of the premises admits complex, multi-faceted reasoning.
Experiments reveal that, while human annotators agree on the relationships between a table-hypothesis pair, several standard modeling strategies are unsuccessful at the task.
arXiv Detail & Related papers (2020-05-13T02:07:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.