Related papers: The Structure of Relation Decoding Linear Operators in Large Language Models

The Structure of Relation Decoding Linear Operators in Large Language Models

URL: http://arxiv.org/abs/2510.26543v1
Date: Thu, 30 Oct 2025 14:36:09 GMT
Title: The Structure of Relation Decoding Linear Operators in Large Language Models
Authors: Miranda Anna Christ, Adrián Csiszárik, Gergely Becsó, Dániel Varga,
Abstract summary: We study the structure of linear operators that decode specific relational facts in transformer language models.<n>We show that such collections of relation decoders can be highly compressed by simple order-3 tensor networks.<n>Our findings thus interpret linear relational decoding in transformer language models as primarily property-based, rather than relation-specific.
Score: 0.5219568203653522
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper investigates the structure of linear operators introduced in Hernandez et al. [2023] that decode specific relational facts in transformer language models. We extend their single-relation findings to a collection of relations and systematically chart their organization. We show that such collections of relation decoders can be highly compressed by simple order-3 tensor networks without significant loss in decoding accuracy. To explain this surprising redundancy, we develop a cross-evaluation protocol, in which we apply each linear decoder operator to the subjects of every other relation. Our results reveal that these linear maps do not encode distinct relations, but extract recurring, coarse-grained semantic properties (e.g., country of capital city and country of food are both in the country-of-X property). This property-centric structure clarifies both the operators' compressibility and highlights why they generalize only to new relations that are semantically close. Our findings thus interpret linear relational decoding in transformer language models as primarily property-based, rather than relation-specific.

Related papers

Geometric Relational Embeddings [19.383110247906256]
We propose relational embeddings, a paradigm of embeddings that respect the underlying symbolic structures. Results obtained from benchmark real-world datasets demonstrate the efficacy of geometric relational embeddings.
arXiv Detail & Related papers (2024-09-18T22:02:24Z)
Learning Syntax Without Planting Trees: Understanding Hierarchical Generalization in Transformers [74.96551626420188]
Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures.<n>We investigate sources of inductive bias in transformer models and their training that could cause such generalization behavior to emerge.
arXiv Detail & Related papers (2024-04-25T07:10:29Z)
On Linearizing Structured Data in Encoder-Decoder Language Models: Insights from Text-to-SQL [8.57550491437633]
This work investigates the linear handling of structured data in encoder-decoder language models, specifically T5. Our findings reveal the model's ability to mimic human-designed processes such as schema linking and syntax prediction. We also uncover insights into the model's internal mechanisms, including the ego-centric nature of structure node encodings.
arXiv Detail & Related papers (2024-04-03T01:16:20Z)
Linearity of Relation Decoding in Transformer Language Models [82.47019600662874]
Much of the knowledge encoded in transformer language models (LMs) may be expressed in terms of relations. We show that, for a subset of relations, this computation is well-approximated by a single linear transformation on the subject representation.
arXiv Detail & Related papers (2023-08-17T17:59:19Z)
Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space. We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z)
Query Structure Modeling for Inductive Logical Reasoning Over Knowledge Graphs [67.043747188954]
We propose a structure-modeled textual encoding framework for inductive logical reasoning over KGs. It encodes linearized query structures and entities using pre-trained language models to find answers. We conduct experiments on two inductive logical reasoning datasets and three transductive datasets.
arXiv Detail & Related papers (2023-05-23T01:25:29Z)
On Neural Architecture Inductive Biases for Relational Tasks [76.18938462270503]
We introduce a simple architecture based on similarity-distribution scores which we name Compositional Network generalization (CoRelNet) We find that simple architectural choices can outperform existing models in out-of-distribution generalizations.
arXiv Detail & Related papers (2022-06-09T16:24:01Z)
Transformers Generalize Linearly [1.7709450506466664]
We examine patterns of structural generalization for Transformer sequence-to-sequence models. We find that not only do Transformers fail to generalize hierarchically across a wide variety of grammatical mapping tasks, but they exhibit an even stronger preference for linear generalization than comparable networks.
arXiv Detail & Related papers (2021-09-24T15:48:46Z)
Probing Linguistic Features of Sentence-Level Representations in Neural Relation Extraction [80.38130122127882]
We introduce 14 probing tasks targeting linguistic properties relevant to neural relation extraction (RE) We use them to study representations learned by more than 40 different encoder architecture and linguistic feature combinations trained on two datasets. We find that the bias induced by the architecture and the inclusion of linguistic features are clearly expressed in the probing task performance.
arXiv Detail & Related papers (2020-04-17T09:17:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.