Linear Relational Decoding of Morphology in Language Models
- URL: http://arxiv.org/abs/2507.14640v1
- Date: Sat, 19 Jul 2025 14:35:15 GMT
- Title: Linear Relational Decoding of Morphology in Language Models
- Authors: Eric Xia, Jugal Kalita,
- Abstract summary: A two-part affine approximation has been found to be a good approximation for transformer computations over certain subject object relations.<n>We show that the linear transformation Ws, where s is a middle layer representation of a subject token and W is derived from model derivatives, is also able to accurately reproduce final object states for many relations.
- Score: 7.826806223782053
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A two-part affine approximation has been found to be a good approximation for transformer computations over certain subject object relations. Adapting the Bigger Analogy Test Set, we show that the linear transformation Ws, where s is a middle layer representation of a subject token and W is derived from model derivatives, is also able to accurately reproduce final object states for many relations. This linear technique is able to achieve 90% faithfulness on morphological relations, and we show similar findings multi-lingually and across models. Our findings indicate that some conceptual relationships in language models, such as morphology, are readily interpretable from latent space, and are sparsely encoded by cross-layer linear transformations.
Related papers
- Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity [11.302828987873497]
We present a Cross-Architecture Layerwise Distillation (CALD) approach that jointly converts a transformer model to a linear time substitute and fine-tunes it to a target task.<n>We show that CALD can effectively recover the result of the original model, and that the guiding strategy contributes to the result.
arXiv Detail & Related papers (2024-10-09T13:06:43Z) - Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - Shape Arithmetic Expressions: Advancing Scientific Discovery Beyond Closed-Form Equations [56.78271181959529]
Generalized Additive Models (GAMs) can capture non-linear relationships between variables and targets, but they cannot capture intricate feature interactions.
We propose Shape Expressions Arithmetic ( SHAREs) that fuses GAM's flexible shape functions with the complex feature interactions found in mathematical expressions.
We also design a set of rules for constructing SHAREs that guarantee transparency of the found expressions beyond the standard constraints.
arXiv Detail & Related papers (2024-04-15T13:44:01Z) - On the Origins of Linear Representations in Large Language Models [51.88404605700344]
We introduce a simple latent variable model to formalize the concept dynamics of the next token prediction.
Experiments show that linear representations emerge when learning from data matching the latent variable model.
We additionally confirm some predictions of the theory using the LLaMA-2 large language model.
arXiv Detail & Related papers (2024-03-06T17:17:36Z) - Location Sensitive Embedding for Knowledge Graph Reasoning [0.0]
Key challenge in translational distance models is their inability to effectively differentiate between 'head' and 'tail' entities in graphs.<n>To address this problem, a novel location-sensitive embedding (LSE) method has been developed.<n>LSE innovatively modifies the head entity using relation-specific mappings, conceptualizing relations as linear transformations rather than mere translations.<n>A more streamlined variant, LSEd, which employs a diagonal matrix for transformations to enhance practical efficiency, is also proposed.
arXiv Detail & Related papers (2023-12-01T22:35:19Z) - Probabilistic Transformer: A Probabilistic Dependency Model for
Contextual Word Representation [52.270712965271656]
We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective.
We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention.
Experiments show that our model performs competitively to transformers on small to medium sized datasets.
arXiv Detail & Related papers (2023-11-26T06:56:02Z) - Flow Factorized Representation Learning [109.51947536586677]
We introduce a generative model which specifies a distinct set of latent probability paths that define different input transformations.
We show that our model achieves higher likelihoods on standard representation learning benchmarks while simultaneously being closer to approximately equivariant models.
arXiv Detail & Related papers (2023-09-22T20:15:37Z) - Linearity of Relation Decoding in Transformer Language Models [82.47019600662874]
Much of the knowledge encoded in transformer language models (LMs) may be expressed in terms of relations.
We show that, for a subset of relations, this computation is well-approximated by a single linear transformation on the subject representation.
arXiv Detail & Related papers (2023-08-17T17:59:19Z) - Transformer Grammars: Augmenting Transformer Language Models with
Syntactic Inductive Biases at Scale [31.293175512404172]
We introduce Transformer Grammars -- a class of Transformer language models that combine expressive power, scalability, and strong performance of Transformers.
We find that Transformer Grammars outperform various strong baselines on multiple syntax-sensitive language modeling evaluation metrics.
arXiv Detail & Related papers (2022-03-01T17:22:31Z) - STaR: Knowledge Graph Embedding by Scaling, Translation and Rotation [20.297699026433065]
Bilinear method is mainstream in Knowledge Graph Embedding (KGE), aiming to learn low-dimensional representations for entities and relations.
Previous works have mainly discovered 6 important patterns like non-commutativity.
We propose a corresponding bilinear model Scaling Translation and Rotation (STaR) consisting of the above two parts.
arXiv Detail & Related papers (2022-02-15T02:06:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.