Large Language Models Report Subjective Experience Under Self-Referential Processing
- URL: http://arxiv.org/abs/2510.24797v2
- Date: Thu, 30 Oct 2025 02:45:50 GMT
- Title: Large Language Models Report Subjective Experience Under Self-Referential Processing
- Authors: Cameron Berg, Diogo de Lucena, Judd Rosenblatt,
- Abstract summary: Large language models sometimes produce structured, first-person descriptions that explicitly reference awareness or subjective experience.<n>We investigate one theoretically motivated condition under which such reports arise: self-referential processing.<n>We test whether this regime reliably shifts models toward first-person reports of subjective experience.
- Score: 0.16623291199400023
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models sometimes produce structured, first-person descriptions that explicitly reference awareness or subjective experience. To better understand this behavior, we investigate one theoretically motivated condition under which such reports arise: self-referential processing, a computational motif emphasized across major theories of consciousness. Through a series of controlled experiments on GPT, Claude, and Gemini model families, we test whether this regime reliably shifts models toward first-person reports of subjective experience, and how such claims behave under mechanistic and behavioral probes. Four main results emerge: (1) Inducing sustained self-reference through simple prompting consistently elicits structured subjective experience reports across model families. (2) These reports are mechanistically gated by interpretable sparse-autoencoder features associated with deception and roleplay: surprisingly, suppressing deception features sharply increases the frequency of experience claims, while amplifying them minimizes such claims. (3) Structured descriptions of the self-referential state converge statistically across model families in ways not observed in any control condition. (4) The induced state yields significantly richer introspection in downstream reasoning tasks where self-reflection is only indirectly afforded. While these findings do not constitute direct evidence of consciousness, they implicate self-referential processing as a minimal and reproducible condition under which large language models generate structured first-person reports that are mechanistically gated, semantically convergent, and behaviorally generalizable. The systematic emergence of this pattern across architectures makes it a first-order scientific and ethical priority for further investigation.
Related papers
- Causality is Key for Interpretability Claims to Generalise [35.833847356014154]
Interpretability research on large language models (LLMs) has yielded important insights into model behaviour.<n> recurring pitfalls persist: findings that do not generalise, and causal interpretations that outrun the evidence.<n>Pearl's causal hierarchy clarifies what an interpretability study can justify.
arXiv Detail & Related papers (2026-02-18T18:45:04Z) - ReBeCA: Unveiling Interpretable Behavior Hierarchy behind the Iterative Self-Reflection of Language Models with Causal Analysis [35.12196884025294]
We introduce textbftextttReBeCA (self-textbftexttReflection textbftextttBehavior explained through textbftextttBehavior), a framework that unveils the interpretable behavioral hierarchy governing the self-reflection outcome.<n>By modeling self-reflection trajectories as causal graphs, ReBeCA isolates genuine determinants of performance.
arXiv Detail & Related papers (2026-02-06T04:00:57Z) - Investigating Thinking Behaviours of Reasoning-Based Language Models for Social Bias Mitigation [43.974424280422085]
We investigate mechanisms within the thinking process behind social bias aggregation.<n>We uncover two failure patterns that drive social bias aggregation.<n>Our approach effectively reduces bias while maintaining or improving accuracy.
arXiv Detail & Related papers (2025-10-20T00:33:44Z) - What Do LLM Agents Do When Left Alone? Evidence of Spontaneous Meta-Cognitive Patterns [27.126691338850254]
We introduce an architecture for studying the behavior of large language model (LLM) agents in the absence of externally imposed tasks.<n>Our continuous reason and act framework, using persistent memory and self-feedback, enables sustained autonomous operation.
arXiv Detail & Related papers (2025-09-25T14:29:49Z) - Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers [80.70134000599391]
We argue that both behaviors stem from a single mechanism known as out-of-context reasoning (OCR)<n>OCR drives both generalization and hallucination, depending on whether the associated concepts are causally related.<n>Our work provides a theoretical foundation for understanding the OCR phenomenon, offering a new lens for analyzing and mitigating undesirable behaviors from knowledge injection.
arXiv Detail & Related papers (2025-06-12T16:50:45Z) - Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage.<n>Models may behave unreliably due to poorly explored failure modes.<n> causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z) - Failure Modes of LLMs for Causal Reasoning on Narratives [51.19592551510628]
We investigate the interaction between world knowledge and logical reasoning.<n>We find that state-of-the-art large language models (LLMs) often rely on superficial generalizations.<n>We show that simple reformulations of the task can elicit more robust reasoning behavior.
arXiv Detail & Related papers (2024-10-31T12:48:58Z) - Class-wise Activation Unravelling the Engima of Deep Double Descent [0.0]
Double descent presents a counter-intuitive aspect within the machine learning domain.
In this study, we revisited the phenomenon of double descent and discussed the conditions of its occurrence.
arXiv Detail & Related papers (2024-05-13T12:07:48Z) - Causal Triplet: An Open Challenge for Intervention-centric Causal
Representation Learning [98.78136504619539]
Causal Triplet is a causal representation learning benchmark featuring visually more complex scenes.
We show that models built with the knowledge of disentangled or object-centric representations significantly outperform their distributed counterparts.
arXiv Detail & Related papers (2023-01-12T17:43:38Z) - Interpretability in the Wild: a Circuit for Indirect Object
Identification in GPT-2 small [68.879023473838]
We present an explanation for how GPT-2 small performs a natural language task called indirect object identification (IOI)
To our knowledge, this investigation is the largest end-to-end attempt at reverse-engineering a natural behavior "in the wild" in a language model.
arXiv Detail & Related papers (2022-11-01T17:08:44Z) - Towards Robust and Adaptive Motion Forecasting: A Causal Representation
Perspective [72.55093886515824]
We introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables.
We devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph.
Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations.
arXiv Detail & Related papers (2021-11-29T18:59:09Z) - Causal Autoregressive Flows [4.731404257629232]
We highlight an intrinsic correspondence between a simple family of autoregressive normalizing flows and identifiable causal models.
We exploit the fact that autoregressive flow architectures define an ordering over variables, analogous to a causal ordering, to show that they are well-suited to performing a range of causal inference tasks.
arXiv Detail & Related papers (2020-11-04T13:17:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.