Fugu-MT 論文翻訳(概要): Representation Without Control: Testing the Realization Effect in Language Models

論文の概要: Representation Without Control: Testing the Realization Effect in Language Models

arxiv url: http://arxiv.org/abs/2605.25151v1
Date: Sun, 24 May 2026 16:07:34 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:18.860035
Title: Representation Without Control: Testing the Realization Effect in Language Models
Title（参考訳）: 制御のない表現:言語モデルにおける実現効果のテスト
Authors: Ciarán Walsh, Emilio Barkett,
Abstract要約: 大規模言語モデルは行動シミュレータとしてますます使われているが、そのアウトプットが人間のような認知メカニズムを反映しているかどうかは不明だ。本研究では, リスクテイクが紙の後に体系的に異なる行動経済学において, 実効と実効的な利益と損失とを両立させることにより, この問題を解明する。我々は,LPMの動作を,アクティベーションステアリングによる行動感度,内部表現の線形読み出し,因果制御の3つのレベルで評価した。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models are increasingly used as behavioral simulators, but it remains unclear when their outputs reflect human-like cognitive mechanisms rather than prompt-sensitive surface patterns. We study this question through the realization effect, a well-characterized finding in behavioral economics in which risk-taking differs systematically after paper versus realized gains and losses. We evaluate LLM behavior at three levels: prompt-only behavioral sensitivity, linear readout of internal representations, and causal control via activation steering. Prompt-only results show systematic condition sensitivity, but the directional pattern does not reproduce human realization-effect predictions. Gemma's residual stream contains a linearly decodable realization-status signal at layer 18 that generalizes to held-out prompts. Steering along this direction does not, however, reliably shift downstream risk choices, a null result that holds across positive scales and in a negative sign-symmetry run. Behavioral sensitivity, latent readout, and causal control are three distinct properties that do not automatically co-occur, and successful latent readout is insufficient evidence that a model behaviorally relies on a representation during downstream decision-making.
Abstract（参考訳）: 大規模言語モデルは、行動シミュレータとしてますます使われているが、そのアウトプットが、素早い感性表面パターンよりも人間のような認知メカニズムを反映しているかは、まだ不明である。本研究では, リスクテイクが紙の後に体系的に異なる行動経済学において, 実効と実効的な利益と損失とを両立させることにより, この問題を解明する。我々は,LPMの動作を,アクティベーションステアリングによる行動感度,内部表現の線形読み出し,因果制御の3つのレベルで評価した。プロンプトのみの結果は、体系的な状態感受性を示すが、方向パターンは人間の実現効果予測を再現しない。 Gemmaの残留ストリームは18層に線形にデオード可能な実現統計信号を含み、保留プロンプトに一般化する。しかし、この方向に沿ったステアリングは、下流のリスク選択、正のスケールにまたがるヌル結果、負のシグマ対称性ランを確実にシフトしない。行動感度、潜時読み出し、因果制御は、自動的に共起しない3つの異なる特性であり、成功した潜時読み出しは、下流の意思決定においてモデルが行動的に表現に依存するという十分な証拠である。

論文の概要: Representation Without Control: Testing the Realization Effect in Language Models

関連論文リスト