Bilinear relational structure fixes reversal curse and enables consistent model editing
- URL: http://arxiv.org/abs/2509.21993v2
- Date: Fri, 07 Nov 2025 13:49:40 GMT
- Title: Bilinear relational structure fixes reversal curse and enables consistent model editing
- Authors: Dong-Kyum Kim, Minsung Kim, Jea Kwon, Nakyeong Yang, Meeyoung Cha,
- Abstract summary: We show that the reversal curse is not an inherent failure but an artifact of how models encode knowledge.<n>By training LMs from scratch on a synthetic dataset of relational knowledge graphs, we demonstrate that bilinear relational structure emerges in their hidden representations.<n>This structure substantially alleviates the reversal curse, enabling LMs to infer unseen reverse facts.
- Score: 18.483285872202107
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The reversal curse -- a language model's (LM) inability to infer an unseen fact ``B is A'' from a learned fact ``A is B'' -- is widely considered a fundamental limitation. We show that this is not an inherent failure but an artifact of how models encode knowledge. By training LMs from scratch on a synthetic dataset of relational knowledge graphs, we demonstrate that bilinear relational structure emerges in their hidden representations. This structure substantially alleviates the reversal curse, enabling LMs to infer unseen reverse facts. Crucially, we also find that this bilinear structure plays a key role in consistent model editing. When a fact is updated in a LM with this structure, the edit correctly propagates to its reverse and other logically dependent facts. In contrast, models lacking this representation not only suffer from the reversal curse but also fail to generalize edits, further introducing logical inconsistencies. Our results establish that training on a relational knowledge dataset induces the emergence of bilinear internal representations, which in turn enable LMs to behave in a logically consistent manner after editing. This implies that the success of model editing depends critically not just on editing algorithms but on the underlying representational geometry of the knowledge being modified.
Related papers
- Structural Disentanglement in Bilinear MLPs via Architectural Inductive Bias [0.0]
We argue that failures arise from how models structure their internal representations during training.<n>We show analytically that bilinear parameterizations possess a non-mixing' property under gradient flow conditions.<n>Unlike pointwise nonlinear networks, multiplicative architectures are able to recover true operators aligned with the underlying algebraic structure.
arXiv Detail & Related papers (2026-02-05T13:14:01Z) - Behemoth: Benchmarking Unlearning in LLMs Using Fully Synthetic Data [43.026389128544594]
We propose Behemoth, a framework for understanding the effects of model editing on large language models trained on real-world data.<n>We show that, in some cases, echo real-world results, for instance, that in some cases restricting the update rank results in a more effective update.
arXiv Detail & Related papers (2026-01-30T16:39:42Z) - Training Language Models to Explain Their Own Computations [73.8562596518326]
We study the extent to which LMs' privileged access to their own internals can be leveraged to produce new techniques for explaining their behavior.<n>Using existing interpretability techniques as a source of ground truth, we fine-tune LMs to generate natural language descriptions of (1) the information encoded by LM features, (2) the causal structure of LMs' internal activations, and (3) the influence of specific input tokens on LM outputs.
arXiv Detail & Related papers (2025-11-11T18:57:14Z) - Is Model Editing Built on Sand? Revealing Its Illusory Success and Fragile Foundation [50.40861036534546]
Large language models (LLMs) inevitably encode outdated or incorrect knowledge. Updating, deleting, and forgetting such knowledge is important for alignment, safety, and other issues.<n>To address this issue, model editing has emerged as a promising paradigm: by precisely editing a small subset of parameters such that a specific fact is updated while preserving other knowledge.<n>Despite its great success reported in previous papers, we find the apparent reliability of editing rests on a fragile foundation.<n>Our empirical evidence shows that editing is likely to be based on shortcuts rather than full semantics, calling for an urgent reconsideration of the very basis of model editing before further advancements can
arXiv Detail & Related papers (2025-10-01T07:59:23Z) - Factual Self-Awareness in Language Models: Representation, Robustness, and Scaling [56.26834106704781]
Factual incorrectness in generated content is one of the primary concerns in ubiquitous deployment of large language models (LLMs)<n>We provide evidence supporting the presence of LLMs' internal compass that dictate the correctness of factual recall at the time of generation.<n>Scaling experiments across model sizes and training dynamics highlight that self-awareness emerges rapidly during training and peaks in intermediate layers.
arXiv Detail & Related papers (2025-05-27T16:24:02Z) - Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing [20.276952762837098]
Knowledge Editing (KE) algorithms alter models' weights to perform targeted updates to incorrect, outdated, or otherwise unwanted factual associations.<n>We show that applying KE can adversely affect models' broader factual recall accuracy and diminish their reasoning abilities.<n>Our work yields a precise mechanistic hypothesis to explain why KE has adverse effects on model abilities.
arXiv Detail & Related papers (2024-10-22T17:13:34Z) - A Theoretical Understanding of Self-Correction through In-context Alignment [51.622068973630796]
Large language models (LLMs) are capable of improving their abilities purely by self-correction.
We show that when LLMs give relatively accurate self-examinations as rewards, they are capable of refining responses in an in-context way.
Inspired by these findings, we also illustrate applications of self-correction, such as defending against LLM jailbreaks.
arXiv Detail & Related papers (2024-05-28T22:33:02Z) - Robust and Scalable Model Editing for Large Language Models [75.95623066605259]
We propose EREN (Edit models by REading Notes) to improve the scalability and robustness of LLM editing.
Unlike existing techniques, it can integrate knowledge from multiple edits, and correctly respond to syntactically similar but semantically unrelated inputs.
arXiv Detail & Related papers (2024-03-26T06:57:23Z) - The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse [58.0132400208411]
Even a single edit can trigger model collapse, manifesting as significant performance degradation in various benchmark tasks.
benchmarking Large Language Models after each edit is impractically time-consuming and resource-intensive.
We have utilized GPT-3.5 to develop a new dataset, HardEdit, based on hard cases.
arXiv Detail & Related papers (2024-02-15T01:50:38Z) - Untying the Reversal Curse via Bidirectional Language Model Editing [41.040662400025184]
Large language models (LLMs) store massive factual knowledge within their parameters.
LLMs are prone to hallucinate unintended text due to false or outdated knowledge.
We study bidirectional language model editing to assess if edited LLMs can recall the editing knowledge bidirectionally.
arXiv Detail & Related papers (2023-10-16T12:04:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.