The Low-Dimensional Linear Geometry of Contextualized Word
Representations
- URL: http://arxiv.org/abs/2105.07109v1
- Date: Sat, 15 May 2021 00:58:08 GMT
- Title: The Low-Dimensional Linear Geometry of Contextualized Word
Representations
- Authors: Evan Hernandez and Jacob Andreas
- Abstract summary: We study the linear geometry of contextualized word representations in ELMO and BERT.
We show that a variety of linguistic features are encoded in low-dimensional subspaces.
- Score: 27.50785941238007
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Black-box probing models can reliably extract linguistic features like tense,
number, and syntactic role from pretrained word representations. However, the
manner in which these features are encoded in representations remains poorly
understood. We present a systematic study of the linear geometry of
contextualized word representations in ELMO and BERT. We show that a variety of
linguistic features (including structured dependency relationships) are encoded
in low-dimensional subspaces. We then refine this geometric picture, showing
that there are hierarchical relations between the subspaces encoding general
linguistic categories and more specific ones, and that low-dimensional feature
encodings are distributed rather than aligned to individual neurons. Finally,
we demonstrate that these linear subspaces are causally related to model
behavior, and can be used to perform fine-grained manipulation of BERT's output
distribution.
Related papers
- Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations [24.211603400355756]
Next-token prediction (NTP) over large text corpora has become the go-to paradigm to train large language models.
We look at how NTP influences the mapping of linguistic patterns to geometric properties of the resulting model representations.
We validate our findings on synthetic and small-scale real language datasets.
arXiv Detail & Related papers (2024-08-27T21:46:47Z) - On the Origins of Linear Representations in Large Language Models [51.88404605700344]
We introduce a simple latent variable model to formalize the concept dynamics of the next token prediction.
Experiments show that linear representations emerge when learning from data matching the latent variable model.
We additionally confirm some predictions of the theory using the LLaMA-2 large language model.
arXiv Detail & Related papers (2024-03-06T17:17:36Z) - Linearity of Relation Decoding in Transformer Language Models [82.47019600662874]
Much of the knowledge encoded in transformer language models (LMs) may be expressed in terms of relations.
We show that, for a subset of relations, this computation is well-approximated by a single linear transformation on the subject representation.
arXiv Detail & Related papers (2023-08-17T17:59:19Z) - Representation Of Lexical Stylistic Features In Language Models'
Embedding Space [28.60690854046176]
We show that it is possible to derive a vector representation for each of these stylistic notions from only a small number of seed pairs.
We conduct experiments on five datasets and find that static embeddings encode these features more accurately at the level of words and phrases.
The lower performance of contextualized representations at the word level is partially attributable to the anisotropy of their vector space.
arXiv Detail & Related papers (2023-05-29T23:44:26Z) - Linear Spaces of Meanings: Compositional Structures in Vision-Language
Models [110.00434385712786]
We investigate compositional structures in data embeddings from pre-trained vision-language models (VLMs)
We first present a framework for understanding compositional structures from a geometric perspective.
We then explain what these structures entail probabilistically in the case of VLM embeddings, providing intuitions for why they arise in practice.
arXiv Detail & Related papers (2023-02-28T08:11:56Z) - Cross-Lingual BERT Contextual Embedding Space Mapping with Isotropic and
Isometric Conditions [7.615096161060399]
We investigate a context-aware and dictionary-free mapping approach by leveraging parallel corpora.
Our findings unfold the tight relationship between isotropy, isometry, and isomorphism in normalized contextual embedding spaces.
arXiv Detail & Related papers (2021-07-19T22:57:36Z) - Low-Dimensional Structure in the Space of Language Representations is
Reflected in Brain Responses [62.197912623223964]
We show a low-dimensional structure where language models and translation models smoothly interpolate between word embeddings, syntactic and semantic tasks, and future word embeddings.
We find that this representation embedding can predict how well each individual feature space maps to human brain responses to natural language stimuli recorded using fMRI.
This suggests that the embedding captures some part of the brain's natural language representation structure.
arXiv Detail & Related papers (2021-06-09T22:59:12Z) - Representing Syntax and Composition with Geometric Transformations [1.439493901412045]
syntactic graphs (SyGs) as a word's context has been shown to be beneficial for distributional semantic models (DSMs)
We investigate which GT better encodes syntactic relations, so that these representations can be used to enhance phrase-level composition via syntactic contextualisation.
arXiv Detail & Related papers (2021-06-03T14:53:34Z) - An Interpretability Illusion for BERT [61.2687465308121]
We describe an "interpretability illusion" that arises when analyzing the BERT model.
We trace the source of this illusion to geometric properties of BERT's embedding space.
We provide a taxonomy of model-learned concepts and discuss methodological implications for interpretability research.
arXiv Detail & Related papers (2021-04-14T22:04:48Z) - Emergence of Separable Manifolds in Deep Language Representations [26.002842878797765]
Deep neural networks (DNNs) have shown much empirical success in solving perceptual tasks across various cognitive modalities.
Recent studies report considerable similarities between representations extracted from task-optimized DNNs and neural populations in the brain.
DNNs have subsequently become a popular model class to infer computational principles underlying complex cognitive functions.
arXiv Detail & Related papers (2020-06-01T17:23:44Z) - APo-VAE: Text Generation in Hyperbolic Space [116.11974607497986]
In this paper, we investigate text generation in a hyperbolic latent space to learn continuous hierarchical representations.
An Adrial Poincare Variversaational Autoencoder (APo-VAE) is presented, where both the prior and variational posterior of latent variables are defined over a Poincare ball via wrapped normal distributions.
Experiments in language modeling and dialog-response generation tasks demonstrate the winning effectiveness of the proposed APo-VAE model.
arXiv Detail & Related papers (2020-04-30T19:05:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.