Differential syntactic and semantic encoding in LLMs
- URL: http://arxiv.org/abs/2601.04765v2
- Date: Fri, 09 Jan 2026 09:02:00 GMT
- Title: Differential syntactic and semantic encoding in LLMs
- Authors: Santiago Acevedo, Alessandro Laio, Marco Baroni,
- Abstract summary: We study how syntactic and semantic information is encoded in inner layer representations of Large Language Models (LLMs)<n>We find that the cross-layer encoding profiles of syntax and semantics are different, and that the two signals can to some extent be decoupled.
- Score: 49.300174325011426
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study how syntactic and semantic information is encoded in inner layer representations of Large Language Models (LLMs), focusing on the very large DeepSeek-V3. We find that, by averaging hidden-representation vectors of sentences sharing syntactic structure or meaning, we obtain vectors that capture a significant proportion of the syntactic and semantic information contained in the representations. In particular, subtracting these syntactic and semantic ``centroids'' from sentence vectors strongly affects their similarity with syntactically and semantically matched sentences, respectively, suggesting that syntax and semantics are, at least partially, linearly encoded. We also find that the cross-layer encoding profiles of syntax and semantics are different, and that the two signals can to some extent be decoupled, suggesting differential encoding of these two types of linguistic information in LLM representations.
Related papers
- Understanding Subword Compositionality of Large Language Models [42.51978887170929]
Large language models (LLMs) take sequences of subwords as input, requiring them to compose subword representations.<n>We present a comprehensive set of experiments to probe how LLMs compose subword information.
arXiv Detail & Related papers (2025-08-25T12:16:56Z) - Semantic Structure in Large Language Model Embeddings [0.0]
Psychological research consistently finds that human ratings of words can be reduced to a low-dimensional form with relatively little information loss.<n>We show that the projections of words on semantic directions defined by antonym pairs correlate highly with human ratings.<n>We find that shifting tokens along one semantic direction causes off-target effects on geometrically aligned features proportional to their cosine similarity.
arXiv Detail & Related papers (2025-08-04T20:21:50Z) - A quantitative analysis of semantic information in deep representations of text and images [42.597592429757746]
We present a method for measuring the relative information content of the representations of semantically related data.<n>We probe how it is encoded into multiple tokens of large language models (LLMs) and vision transformers.<n>We observe significant and model-dependent information asymmetries between image and text representations.
arXiv Detail & Related papers (2025-05-21T07:38:48Z) - Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs)
We form "semantic tokens" by merging the semantically similar subwords and their embeddings.
inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z) - Spoken Word2Vec: Learning Skipgram Embeddings from Speech [0.8901073744693314]
We show how shallow skipgram-like algorithms fail to encode distributional semantics when the input units are acoustically correlated.
We illustrate the potential of an alternative deep end-to-end variant of the model and examine the effects on the resulting embeddings.
arXiv Detail & Related papers (2023-11-15T19:25:29Z) - Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic
Representations [102.05351905494277]
Sub-sentence encoder is a contrastively-learned contextual embedding model for fine-grained semantic representation of text.
We show that sub-sentence encoders keep the same level of inference cost and space complexity compared to sentence encoders.
arXiv Detail & Related papers (2023-11-07T20:38:30Z) - Agentivit\`a e telicit\`a in GilBERTo: implicazioni cognitive [77.71680953280436]
The goal of this study is to investigate whether a Transformer-based neural language model infers lexical semantics.
The semantic properties considered are telicity (also combined with definiteness) and agentivity.
arXiv Detail & Related papers (2023-07-06T10:52:22Z) - Representation Of Lexical Stylistic Features In Language Models'
Embedding Space [28.60690854046176]
We show that it is possible to derive a vector representation for each of these stylistic notions from only a small number of seed pairs.
We conduct experiments on five datasets and find that static embeddings encode these features more accurately at the level of words and phrases.
The lower performance of contextualized representations at the word level is partially attributable to the anisotropy of their vector space.
arXiv Detail & Related papers (2023-05-29T23:44:26Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Unsupervised Distillation of Syntactic Information from Contextualized
Word Representations [62.230491683411536]
We tackle the task of unsupervised disentanglement between semantics and structure in neural language representations.
To this end, we automatically generate groups of sentences which are structurally similar but semantically different.
We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics.
arXiv Detail & Related papers (2020-10-11T15:13:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.