Model Internal Sleuthing: Finding Lexical Identity and Inflectional Morphology in Modern Language Models
- URL: http://arxiv.org/abs/2506.02132v3
- Date: Wed, 06 Aug 2025 04:14:16 GMT
- Title: Model Internal Sleuthing: Finding Lexical Identity and Inflectional Morphology in Modern Language Models
- Authors: Michael Li, Nishant Subramani,
- Abstract summary: We investigate how 25 models represent lexical identity and inflectional morphology across six typologically diverse languages.<n>We find that models concentrate lexical information linearly in early layers and increasingly nonlinearly in later layers.<n>Remarkably, these encoding patterns emerge across all models we test, despite differences in architecture, size, and training regime.
- Score: 5.317065202153858
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large transformer-based language models dominate modern NLP, yet our understanding of how they encode linguistic information is rooted in studies of early models like BERT and GPT-2. To better understand today's language models, we investigate how 25 models - from classical architectures (BERT, DeBERTa, GPT-2) to modern large language models (Pythia, OLMo-2, Gemma-2, Qwen2.5, Llama-3.1) - represent lexical identity and inflectional morphology across six typologically diverse languages. Using linear and nonlinear classifiers trained on hidden activations, we predict word lemmas and inflectional features layer by layer. We find that models concentrate lexical information linearly in early layers and increasingly nonlinearly in later layers, while keeping inflectional information uniformly accessible and linearly separable throughout. Additional experiments probe the nature of these encodings: attention and residual analyses examine where within layers information can be recovered, steering vector experiments test what information can be functionally manipulated, and intrinsic dimensionality analyses explore how the representational structure evolves across layers. Remarkably, these encoding patterns emerge across all models we test, despite differences in architecture, size, and training regime (pretrained and instruction-tuned variants). This suggests that, even with substantial advances in LLM technologies, transformer models organize linguistic information in similar ways, indicating that these properties are important for next token prediction and are learned early during pretraining. Our code is available at https://github.com/ml5885/model_internal_sleuthing
Related papers
- The more polypersonal the better -- a short look on space geometry of fine-tuned layers [0.0]
We analyze the changes in the internal representation of the BERT model when it is trained with additional grammatical modules.<n>We find that adding a single grammatical layer causes the model to separate the new and old grammatical systems within itself.
arXiv Detail & Related papers (2025-01-09T18:50:47Z) - Scaling Laws for Linear Complexity Language Models [18.787664489713332]
We present the scaling laws for linear complexity language models to establish a foundation for their scalability.
The study reveals that existing linear complexity language models exhibit similar scaling capabilities as conventional transformer-based models.
arXiv Detail & Related papers (2024-06-24T14:51:31Z) - Talking Heads: Understanding Inter-layer Communication in Transformer Language Models [32.2976613483151]
We analyze a mechanism used in two LMs to selectively inhibit items in a context in one task.<n>We find that models write into low-rank subspaces of the residual stream to represent features which are then read out by later layers.
arXiv Detail & Related papers (2024-06-13T18:12:01Z) - Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - Decoding Probing: Revealing Internal Linguistic Structures in Neural Language Models using Minimal Pairs [0.873811641236639]
We introduce a novel decoding probing' method to probe internal linguistic characteristics in neural language models layer by layer.
By treating the language model as the brain' and its representations as neural activations', we decode grammaticality labels of minimal pairs from the intermediate layers' representations.
arXiv Detail & Related papers (2024-03-26T00:56:06Z) - On the Origins of Linear Representations in Large Language Models [51.88404605700344]
We introduce a simple latent variable model to formalize the concept dynamics of the next token prediction.
Experiments show that linear representations emerge when learning from data matching the latent variable model.
We additionally confirm some predictions of the theory using the LLaMA-2 large language model.
arXiv Detail & Related papers (2024-03-06T17:17:36Z) - In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL)
We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z) - Unveiling Multilinguality in Transformer Models: Exploring Language
Specificity in Feed-Forward Networks [12.7259425362286]
We investigate how multilingual models might leverage key-value memories.
For autoregressive models trained on two or more languages, do all neurons (across layers) respond equally to all languages?
Our findings reveal that the layers closest to the network's input or output tend to exhibit more language-specific behaviour compared to the layers in the middle.
arXiv Detail & Related papers (2023-10-24T06:45:00Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Is neural language acquisition similar to natural? A chronological
probing study [0.0515648410037406]
We present the chronological probing study of transformer English models such as MultiBERT and T5.
We compare the information about the language learned by the models in the process of training on corpora.
The results show that 1) linguistic information is acquired in the early stages of training 2) both language models demonstrate capabilities to capture various features from various levels of language.
arXiv Detail & Related papers (2022-07-01T17:24:11Z) - A Unified Understanding of Deep NLP Models for Text Classification [88.35418976241057]
We have developed a visual analysis tool, DeepNLPVis, to enable a unified understanding of NLP models for text classification.
The key idea is a mutual information-based measure, which provides quantitative explanations on how each layer of a model maintains the information of input words in a sample.
A multi-level visualization, which consists of a corpus-level, a sample-level, and a word-level visualization, supports the analysis from the overall training set to individual samples.
arXiv Detail & Related papers (2022-06-19T08:55:07Z) - Better Language Model with Hypernym Class Prediction [101.8517004687825]
Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs.
In this study, we revisit this approach in the context of neural LMs.
arXiv Detail & Related papers (2022-03-21T01:16:44Z) - Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese
Pre-trained Language Models [62.41139712595334]
We propose a novel pre-training paradigm for Chinese -- Lattice-BERT.
We construct a lattice graph from the characters and words in a sentence and feed all these text units into transformers.
We show that our model can bring an average increase of 1.5% under the 12-layer setting.
arXiv Detail & Related papers (2021-04-15T02:36:49Z) - Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long.
We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay.
Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.