A polar coordinate system represents syntax in large language models
- URL: http://arxiv.org/abs/2412.05571v1
- Date: Sat, 07 Dec 2024 07:37:20 GMT
- Title: A polar coordinate system represents syntax in large language models
- Authors: Pablo Diego-Simón, Stéphane D'Ascoli, Emmanuel Chemla, Yair Lakretz, Jean-Rémi King,
- Abstract summary: syntactic trees may also be effectively represented in the activations of large language models.<n>We introduce a 'Polar Probe' trained to read syntactic relations from both the distance and the direction between word embeddings.<n>Our approach reveals three main findings. First, our Polar Probe successfully recovers the type and direction of syntactic relations, and substantially outperforms the Structural Probe by nearly two folds.
- Score: 12.244752597245645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Originally formalized with symbolic representations, syntactic trees may also be effectively represented in the activations of large language models (LLMs). Indeed, a 'Structural Probe' can find a subspace of neural activations, where syntactically related words are relatively close to one-another. However, this syntactic code remains incomplete: the distance between the Structural Probe word embeddings can represent the existence but not the type and direction of syntactic relations. Here, we hypothesize that syntactic relations are, in fact, coded by the relative direction between nearby embeddings. To test this hypothesis, we introduce a 'Polar Probe' trained to read syntactic relations from both the distance and the direction between word embeddings. Our approach reveals three main findings. First, our Polar Probe successfully recovers the type and direction of syntactic relations, and substantially outperforms the Structural Probe by nearly two folds. Second, we confirm that this polar coordinate system exists in a low-dimensional subspace of the intermediate layers of many LLMs and becomes increasingly precise in the latest frontier models. Third, we demonstrate with a new benchmark that similar syntactic relations are coded similarly across the nested levels of syntactic trees. Overall, this work shows that LLMs spontaneously learn a geometry of neural activations that explicitly represents the main symbolic structures of linguistic theory.
Related papers
- Differential syntactic and semantic encoding in LLMs [49.300174325011426]
We study how syntactic and semantic information is encoded in inner layer representations of Large Language Models (LLMs)<n>We find that the cross-layer encoding profiles of syntax and semantics are different, and that the two signals can to some extent be decoupled.
arXiv Detail & Related papers (2026-01-08T09:33:29Z) - Native Logical and Hierarchical Representations with Subspace Embeddings [25.274936769664098]
We introduce a novel paradigm: embedding concepts as linear subspaces.<n>It naturally supports set-theoretic operations like intersection (conjunction) and linear sum (disjunction)<n>Our method achieves state-of-the-art results in reconstruction and link prediction on WordNet.
arXiv Detail & Related papers (2025-08-21T18:29:17Z) - Semantic Structure in Large Language Model Embeddings [0.0]
Psychological research consistently finds that human ratings of words can be reduced to a low-dimensional form with relatively little information loss.<n>We show that the projections of words on semantic directions defined by antonym pairs correlate highly with human ratings.<n>We find that shifting tokens along one semantic direction causes off-target effects on geometrically aligned features proportional to their cosine similarity.
arXiv Detail & Related papers (2025-08-04T20:21:50Z) - Exploring the Small World of Word Embeddings: A Comparative Study on Conceptual Spaces from LLMs of Different Scales [47.52062992606549]
A conceptual space represents concepts as nodes and semantic relatedness as edges.
We construct a conceptual space using word embeddings from large language models of varying scales.
We analyze conceptual pairs, WordNet relations, and a cross-lingual semantic network for qualitative words.
arXiv Detail & Related papers (2025-02-17T02:52:07Z) - Learning Complete Topology-Aware Correlations Between Relations for Inductive Link Prediction [121.65152276851619]
We show that semantic correlations between relations are inherently edge-level and entity-independent.
We propose a novel subgraph-based method, namely TACO, to model Topology-Aware COrrelations between relations.
To further exploit the potential of RCN, we propose Complete Common Neighbor induced subgraph.
arXiv Detail & Related papers (2023-09-20T08:11:58Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Syntactic Substitutability as Unsupervised Dependency Syntax [31.488677474152794]
We model a more general property implicit in the definition of dependency relations, syntactic substitutability.
This property captures the fact that words at either end of a dependency can be substituted with words from the same category.
We show that increasing the number of substitutions used improves parsing accuracy on natural data.
arXiv Detail & Related papers (2022-11-29T09:01:37Z) - Syntactic Persistence in Language Models: Priming as a Window into
Abstract Language Representations [0.38498574327875945]
We investigate the extent to which modern, neural language models are susceptible to syntactic priming.
We introduce a novel metric and release Prime-LM, a large corpus where we control for various linguistic factors which interact with priming strength.
We report surprisingly strong priming effects when priming with multiple sentences, each with different words and meaning but with identical syntactic structure.
arXiv Detail & Related papers (2021-09-30T10:38:38Z) - Modeling Human Sentence Processing with Left-Corner Recurrent Neural
Network Grammars [10.232142358698605]
In computational linguistics, it has been shown that hierarchical structures make language models (LMs) more human-like.
In this paper, we investigated whether hierarchical structures make LMs more human-like, and if so, which parsing strategy is most cognitively plausible.
arXiv Detail & Related papers (2021-09-10T15:35:00Z) - The Low-Dimensional Linear Geometry of Contextualized Word
Representations [27.50785941238007]
We study the linear geometry of contextualized word representations in ELMO and BERT.
We show that a variety of linguistic features are encoded in low-dimensional subspaces.
arXiv Detail & Related papers (2021-05-15T00:58:08Z) - Prototypical Representation Learning for Relation Extraction [56.501332067073065]
This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data.
We learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations.
Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art relational models.
arXiv Detail & Related papers (2021-03-22T08:11:43Z) - Decomposing lexical and compositional syntax and semantics with deep
language models [82.81964713263483]
The activations of language transformers like GPT2 have been shown to linearly map onto brain activity during speech comprehension.
Here, we propose a taxonomy to factorize the high-dimensional activations of language models into four classes: lexical, compositional, syntactic, and semantic representations.
The results highlight two findings. First, compositional representations recruit a more widespread cortical network than lexical ones, and encompass the bilateral temporal, parietal and prefrontal cortices.
arXiv Detail & Related papers (2021-03-02T10:24:05Z) - High-order Semantic Role Labeling [86.29371274587146]
This paper introduces a high-order graph structure for the neural semantic role labeling model.
It enables the model to explicitly consider not only the isolated predicate-argument pairs but also the interaction between the predicate-argument pairs.
Experimental results on 7 languages of the CoNLL-2009 benchmark show that the high-order structural learning techniques are beneficial to the strong performing SRL models.
arXiv Detail & Related papers (2020-10-09T15:33:54Z) - LSTMs Compose (and Learn) Bottom-Up [18.34617849764921]
Recent work in NLP shows that LSTM language models capture hierarchical structure in language data.
In contrast to existing work, we consider the textitlearning process that leads to their compositional behavior.
We present a related measure of Decompositional Interdependence between word meanings in an LSTM, based on their gate interactions.
arXiv Detail & Related papers (2020-10-06T13:00:32Z) - Exploiting Syntactic Structure for Better Language Modeling: A Syntactic
Distance Approach [78.77265671634454]
We make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "syntactic distances"
Experimental results on the Penn Treebank and Chinese Treebank datasets show that when ground truth parse trees are provided as additional training signals, the model is able to achieve lower perplexity and induce trees with better quality.
arXiv Detail & Related papers (2020-05-12T15:35:00Z) - Word Interdependence Exposes How LSTMs Compose Representations [18.34617849764921]
Recent work in NLP shows that LSTM language models capture compositional structure in language data.
We present a novel measure of interdependence between word meanings in an LSTM, based on their interactions at the internal gates.
arXiv Detail & Related papers (2020-04-27T21:48:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.