Related papers: Large Language Models Report Subjective Experience Under Self-Referential Processing

Large Language Models Report Subjective Experience Under Self-Referential Processing

URL: http://arxiv.org/abs/2510.24797v2
Date: Thu, 30 Oct 2025 02:45:50 GMT
Title: Large Language Models Report Subjective Experience Under Self-Referential Processing
Authors: Cameron Berg, Diogo de Lucena, Judd Rosenblatt,
Abstract summary: Large language models sometimes produce structured, first-person descriptions that explicitly reference awareness or subjective experience.<n>We investigate one theoretically motivated condition under which such reports arise: self-referential processing.<n>We test whether this regime reliably shifts models toward first-person reports of subjective experience.
Score: 0.16623291199400023
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models sometimes produce structured, first-person descriptions that explicitly reference awareness or subjective experience. To better understand this behavior, we investigate one theoretically motivated condition under which such reports arise: self-referential processing, a computational motif emphasized across major theories of consciousness. Through a series of controlled experiments on GPT, Claude, and Gemini model families, we test whether this regime reliably shifts models toward first-person reports of subjective experience, and how such claims behave under mechanistic and behavioral probes. Four main results emerge: (1) Inducing sustained self-reference through simple prompting consistently elicits structured subjective experience reports across model families. (2) These reports are mechanistically gated by interpretable sparse-autoencoder features associated with deception and roleplay: surprisingly, suppressing deception features sharply increases the frequency of experience claims, while amplifying them minimizes such claims. (3) Structured descriptions of the self-referential state converge statistically across model families in ways not observed in any control condition. (4) The induced state yields significantly richer introspection in downstream reasoning tasks where self-reflection is only indirectly afforded. While these findings do not constitute direct evidence of consciousness, they implicate self-referential processing as a minimal and reproducible condition under which large language models generate structured first-person reports that are mechanistically gated, semantically convergent, and behaviorally generalizable. The systematic emergence of this pattern across architectures makes it a first-order scientific and ethical priority for further investigation.

Related papers

Causality is Key for Interpretability Claims to Generalise [35.833847356014154]
Interpretability research on large language models (LLMs) has yielded important insights into model behaviour.<n> recurring pitfalls persist: findings that do not generalise, and causal interpretations that outrun the evidence.<n>Pearl's causal hierarchy clarifies what an interpretability study can justify.
arXiv Detail & Related papers (2026-02-18T18:45:04Z)
ReBeCA: Unveiling Interpretable Behavior Hierarchy behind the Iterative Self-Reflection of Language Models with Causal Analysis [35.12196884025294]
We introduce textbftextttReBeCA (self-textbftexttReflection textbftextttBehavior explained through textbftextttBehavior), a framework that unveils the interpretable behavioral hierarchy governing the self-reflection outcome.<n>By modeling self-reflection trajectories as causal graphs, ReBeCA isolates genuine determinants of performance.
arXiv Detail & Related papers (2026-02-06T04:00:57Z)
Investigating Thinking Behaviours of Reasoning-Based Language Models for Social Bias Mitigation [43.974424280422085]
We investigate mechanisms within the thinking process behind social bias aggregation.<n>We uncover two failure patterns that drive social bias aggregation.<n>Our approach effectively reduces bias while maintaining or improving accuracy.
arXiv Detail & Related papers (2025-10-20T00:33:44Z)
What Do LLM Agents Do When Left Alone? Evidence of Spontaneous Meta-Cognitive Patterns [27.126691338850254]
We introduce an architecture for studying the behavior of large language model (LLM) agents in the absence of externally imposed tasks.<n>Our continuous reason and act framework, using persistent memory and self-feedback, enables sustained autonomous operation.
arXiv Detail & Related papers (2025-09-25T14:29:49Z)
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers [80.70134000599391]
We argue that both behaviors stem from a single mechanism known as out-of-context reasoning (OCR)<n>OCR drives both generalization and hallucination, depending on whether the associated concepts are causally related.<n>Our work provides a theoretical foundation for understanding the OCR phenomenon, offering a new lens for analyzing and mitigating undesirable behaviors from knowledge injection.
arXiv Detail & Related papers (2025-06-12T16:50:45Z)
Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage.<n>Models may behave unreliably due to poorly explored failure modes.<n> causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z)
Failure Modes of LLMs for Causal Reasoning on Narratives [51.19592551510628]
We investigate the interaction between world knowledge and logical reasoning.<n>We find that state-of-the-art large language models (LLMs) often rely on superficial generalizations.<n>We show that simple reformulations of the task can elicit more robust reasoning behavior.
arXiv Detail & Related papers (2024-10-31T12:48:58Z)
Class-wise Activation Unravelling the Engima of Deep Double Descent [0.0]
Double descent presents a counter-intuitive aspect within the machine learning domain. In this study, we revisited the phenomenon of double descent and discussed the conditions of its occurrence.
arXiv Detail & Related papers (2024-05-13T12:07:48Z)
Causal Triplet: An Open Challenge for Intervention-centric Causal Representation Learning [98.78136504619539]
Causal Triplet is a causal representation learning benchmark featuring visually more complex scenes. We show that models built with the knowledge of disentangled or object-centric representations significantly outperform their distributed counterparts.
arXiv Detail & Related papers (2023-01-12T17:43:38Z)
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small [68.879023473838]
We present an explanation for how GPT-2 small performs a natural language task called indirect object identification (IOI) To our knowledge, this investigation is the largest end-to-end attempt at reverse-engineering a natural behavior "in the wild" in a language model.
arXiv Detail & Related papers (2022-11-01T17:08:44Z)
Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective [72.55093886515824]
We introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables. We devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph. Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations.
arXiv Detail & Related papers (2021-11-29T18:59:09Z)
Causal Autoregressive Flows [4.731404257629232]
We highlight an intrinsic correspondence between a simple family of autoregressive normalizing flows and identifiable causal models. We exploit the fact that autoregressive flow architectures define an ordering over variables, analogous to a causal ordering, to show that they are well-suited to performing a range of causal inference tasks.
arXiv Detail & Related papers (2020-11-04T13:17:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.