Related papers: Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers

Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers

URL: http://arxiv.org/abs/2506.10887v2
Date: Fri, 04 Jul 2025 08:35:38 GMT
Title: Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers
Authors: Yixiao Huang, Hanlin Zhu, Tianyu Guo, Jiantao Jiao, Somayeh Sojoudi, Michael I. Jordan, Stuart Russell, Song Mei,
Abstract summary: We argue that both behaviors stem from a single mechanism known as out-of-context reasoning (OCR)<n>OCR drives both generalization and hallucination, depending on whether the associated concepts are causally related.<n>Our work provides a theoretical foundation for understanding the OCR phenomenon, offering a new lens for analyzing and mitigating undesirable behaviors from knowledge injection.
Score: 76.42159902257677
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) can acquire new knowledge through fine-tuning, but this process exhibits a puzzling duality: models can generalize remarkably from new facts, yet are also prone to hallucinating incorrect information. However, the reasons for this phenomenon remain poorly understood. In this work, we argue that both behaviors stem from a single mechanism known as out-of-context reasoning (OCR): the ability to deduce implications by associating concepts, even those without a causal link. Our experiments across five prominent LLMs confirm that OCR indeed drives both generalization and hallucination, depending on whether the associated concepts are causally related. To build a rigorous theoretical understanding of this phenomenon, we then formalize OCR as a synthetic factual recall task. We empirically show that a one-layer single-head attention-only transformer with factorized output and value matrices can learn to solve this task, while a model with combined weights cannot, highlighting the crucial role of matrix factorization. Our theoretical analysis shows that the OCR capability can be attributed to the implicit bias of gradient descent, which favors solutions that minimize the nuclear norm of the combined output-value matrix. This mathematical structure explains why the model learns to associate facts and implications with high sample efficiency, regardless of whether the correlation is causal or merely spurious. Ultimately, our work provides a theoretical foundation for understanding the OCR phenomenon, offering a new lens for analyzing and mitigating undesirable behaviors from knowledge injection.

Related papers

Reframing attention as a reinforcement learning problem for causal discovery [3.2498796510544636]
We introduce Causal Process framework as a novel theory for representing dynamic hypotheses about causal structure.<n>This allows us to reformulate the attention mechanism popularized by Transformer networks within an RL setting.
arXiv Detail & Related papers (2025-07-18T13:50:57Z)
Provable Low-Frequency Bias of In-Context Learning of Representations [19.066378730056275]
In-context learning (ICL) enables large language models (LLMs) to acquire new behaviors from the input sequence alone without any parameter updates.<n>Recent studies have shown that ICL can surpass the original meaning learned in pretraining stage through internalizing the structure the data-generating process (DGP) of the prompt into the hidden representations.<n>We present the first rigorous explanation of such phenomena by introducing a unified framework of double convergence.<n>This double convergence process leads to an implicit bias towards smooth (low-frequency) representations, which we prove analytically and verify empirically.
arXiv Detail & Related papers (2025-07-17T21:19:32Z)
CoT-Kinetics: A Theoretical Modeling Assessing LRM Reasoning Process [45.88054259124436]
Recent Large Reasoning Models significantly improve the reasoning ability of Large Language Models.<n>We present a novel approach towards establishing a CoT-Kinetics energy equation.<n>Our CoT-Kinetics energy assigns a scalar score to evaluate specifically the soundness of the reasoning phase.
arXiv Detail & Related papers (2025-05-19T17:44:26Z)
The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning [39.613595533503144]
Chain-of-Thought (CoT) prompting has been widely recognized for its ability to enhance reasoning capabilities in large language models.<n>We show that CoT consistently underperforms direct answering across varying model scales and benchmark complexities.<n>Our analysis uncovers a fundamental explicit-implicit duality driving CoT's performance in pattern-based ICL.
arXiv Detail & Related papers (2025-04-07T13:51:06Z)
Failure Modes of LLMs for Causal Reasoning on Narratives [51.19592551510628]
We investigate the interaction between world knowledge and logical reasoning.<n>We find that state-of-the-art large language models (LLMs) often rely on superficial generalizations.<n>We show that simple reformulations of the task can elicit more robust reasoning behavior.
arXiv Detail & Related papers (2024-10-31T12:48:58Z)
Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing [20.276952762837098]
Knowledge Editing (KE) algorithms alter models' weights to perform targeted updates to incorrect, outdated, or otherwise unwanted factual associations.<n>We show that applying KE can adversely affect models' broader factual recall accuracy and diminish their reasoning abilities.<n>Our work yields a precise mechanistic hypothesis to explain why KE has adverse effects on model abilities.
arXiv Detail & Related papers (2024-10-22T17:13:34Z)
Identifying Weight-Variant Latent Causal Models [82.14087963690561]
We find that transitivity acts as a key role in impeding the identifiability of latent causal representations. Under some mild assumptions, we can show that the latent causal representations can be identified up to trivial permutation and scaling. We propose a novel method, termed Structural caUsAl Variational autoEncoder, which directly learns latent causal representations and causal relationships among them.
arXiv Detail & Related papers (2022-08-30T11:12:59Z)
Discovering Latent Causal Variables via Mechanism Sparsity: A New Principle for Nonlinear ICA [81.4991350761909]
Independent component analysis (ICA) refers to an ensemble of methods which formalize this goal and provide estimation procedure for practical application. We show that the latent variables can be recovered up to a permutation if one regularizes the latent mechanisms to be sparse.
arXiv Detail & Related papers (2021-07-21T14:22:14Z)
Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning [76.00395335702572]
A central goal for AI and causality is the joint discovery of abstract representations and causal structure. Existing environments for studying causal induction are poorly suited for this objective because they have complicated task-specific causal graphs. In this work, our goal is to facilitate research in learning representations of high-level variables as well as causal structures among them.
arXiv Detail & Related papers (2021-07-02T05:44:56Z)
ACRE: Abstract Causal REasoning Beyond Covariation [90.99059920286484]
We introduce the Abstract Causal REasoning dataset for systematic evaluation of current vision systems in causal induction. Motivated by the stream of research on causal discovery in Blicket experiments, we query a visual reasoning system with the following four types of questions in either an independent scenario or an interventional scenario. We notice that pure neural models tend towards an associative strategy under their chance-level performance, whereas neuro-symbolic combinations struggle in backward-blocking reasoning.
arXiv Detail & Related papers (2021-03-26T02:42:38Z)
A Critical View of the Structural Causal Model [89.43277111586258]
We show that one can identify the cause and the effect without considering their interaction at all. We propose a new adversarial training method that mimics the disentangled structure of the causal model. Our multidimensional method outperforms the literature methods on both synthetic and real world datasets.
arXiv Detail & Related papers (2020-02-23T22:52:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.