Emergence of a High-Dimensional Abstraction Phase in Language Transformers
- URL: http://arxiv.org/abs/2405.15471v1
- Date: Fri, 24 May 2024 11:49:07 GMT
- Title: Emergence of a High-Dimensional Abstraction Phase in Language Transformers
- Authors: Emily Cheng, Diego Doimo, Corentin Kervadec, Iuri Macocco, Jade Yu, Alessandro Laio, Marco Baroni,
- Abstract summary: A language model (LM) is a mapping from a linguistic context to an output token.
We take a high-level geometric approach to its analysis, observing, across five pre-trained transformer-based LMs and three input datasets.
Our results suggest that a central high-dimensionality phase underlies core linguistic processing in many common LM architectures.
- Score: 47.60397331657208
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A language model (LM) is a mapping from a linguistic context to an output token. However, much remains to be known about this mapping, including how its geometric properties relate to its function. We take a high-level geometric approach to its analysis, observing, across five pre-trained transformer-based LMs and three input datasets, a distinct phase characterized by high intrinsic dimensionality. During this phase, representations (1) correspond to the first full linguistic abstraction of the input; (2) are the first to viably transfer to downstream tasks; (3) predict each other across different LMs. Moreover, we find that an earlier onset of the phase strongly predicts better language modelling performance. In short, our results suggest that a central high-dimensionality phase underlies core linguistic processing in many common LM architectures.
Related papers
- Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models [1.534667887016089]
We show that intermediate hidden states extracted from large language models are able to predict measured brain response to natural language stimuli.
We also demonstrate a strong correspondence between layerwise encoding performance and the intrinsic dimensionality of representations from LLMs.
arXiv Detail & Related papers (2024-09-09T16:33:16Z) - Talking Heads: Understanding Inter-layer Communication in Transformer Language Models [32.2976613483151]
We analyze a mechanism used in two LMs to selectively inhibit items in a context in one task.
We find that models write into low-rank subspaces of the residual stream to represent features which are then read out by later layers.
arXiv Detail & Related papers (2024-06-13T18:12:01Z) - The Locality and Symmetry of Positional Encodings [9.246374019271938]
We conduct a systematic study of positional encodings in textbfBi Masked Language Models (BERT-style)
We uncover the core function of PEs by identifying two common properties, Locality and Symmetry.
We quantify the weakness of current PEs by introducing two new probing tasks, on which current PEs perform poorly.
arXiv Detail & Related papers (2023-10-19T16:15:15Z) - Linearity of Relation Decoding in Transformer Language Models [82.47019600662874]
Much of the knowledge encoded in transformer language models (LMs) may be expressed in terms of relations.
We show that, for a subset of relations, this computation is well-approximated by a single linear transformation on the subject representation.
arXiv Detail & Related papers (2023-08-17T17:59:19Z) - The geometry of hidden representations of large transformer models [43.16765170255552]
Large transformers are powerful architectures used for self-supervised data analysis across various data types.
We show that the semantic structure of the dataset emerges from a sequence of transformations between one representation and the next.
We show that the semantic information of the dataset is better expressed at the end of the first peak, and this phenomenon can be observed across many models trained on diverse datasets.
arXiv Detail & Related papers (2023-02-01T07:50:26Z) - Improve Transformer Pre-Training with Decoupled Directional Relative
Position Encoding and Representation Differentiations [23.2969212998404]
We revisit the Transformer-based pre-trained language models and identify two problems that may limit the expressiveness of the model.
Existing relative position encoding models confuse two heterogeneous information: relative distance and direction.
We propose two novel techniques to improve pre-trained language models.
arXiv Detail & Related papers (2022-10-09T12:35:04Z) - Modeling Target-Side Morphology in Neural Machine Translation: A
Comparison of Strategies [72.56158036639707]
Morphologically rich languages pose difficulties to machine translation.
A large amount of differently inflected word surface forms entails a larger vocabulary.
Some inflected forms of infrequent terms typically do not appear in the training corpus.
Linguistic agreement requires the system to correctly match the grammatical categories between inflected word forms in the output sentence.
arXiv Detail & Related papers (2022-03-25T10:13:20Z) - Pre-Trained Language Models for Interactive Decision-Making [72.77825666035203]
We describe a framework for imitation learning in which goals and observations are represented as a sequence of embeddings.
We demonstrate that this framework enables effective generalization across different environments.
For test tasks involving novel goals or novel scenes, initializing policies with language models improves task completion rates by 43.6%.
arXiv Detail & Related papers (2022-02-03T18:55:52Z) - SML: a new Semantic Embedding Alignment Transformer for efficient
cross-lingual Natural Language Inference [71.57324258813674]
The ability of Transformers to perform with precision a variety of tasks such as question answering, Natural Language Inference (NLI) or summarising, have enable them to be ranked as one of the best paradigms to address this kind of tasks at present.
NLI is one of the best scenarios to test these architectures, due to the knowledge required to understand complex sentences and established a relation between a hypothesis and a premise.
In this paper, we propose a new architecture, siamese multilingual transformer, to efficiently align multilingual embeddings for Natural Language Inference.
arXiv Detail & Related papers (2021-03-17T13:23:53Z) - Is Supervised Syntactic Parsing Beneficial for Language Understanding?
An Empirical Investigation [71.70562795158625]
Traditional NLP has long held (supervised) syntactic parsing necessary for successful higher-level semantic language understanding (LU)
Recent advent of end-to-end neural models, self-supervised via language modeling (LM), and their success on a wide range of LU tasks, questions this belief.
We empirically investigate the usefulness of supervised parsing for semantic LU in the context of LM-pretrained transformer networks.
arXiv Detail & Related papers (2020-08-15T21:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.