Fugu-MT 論文翻訳(概要): Do Language Models Track Entities Across State Changes?

論文の概要: Do Language Models Track Entities Across State Changes?

arxiv url: http://arxiv.org/abs/2605.30233v1
Date: Thu, 28 May 2026 17:03:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-30 02:45:56.568017
Title: Do Language Models Track Entities Across State Changes?
Title（参考訳）: 言語モデルは、状態変化全体にわたってエンティティを追跡するか?
Authors: Zilu Tang, Qiao Zhao, Gabriel Franco, Derry Wijaya, Aaron Mueller, Sebastian Schuster, Najoung Kim,
Abstract要約: LMは、各層にまたがるトークンやクエリ関連状態を段階的に追跡するのではなく、クエリが明らかになると、関連する情報を最後のトークンで並列に集約する。驚いたことに、LMは、脆弱なグローバルな抑制タグで$textttREMOVE$演算を実装し、このグローバルな削除メカニズムは、動作確認を行うさまざまな障害モードを予測する。
参考スコア（独自算出の注目度）: 25.16942913524478
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Entity tracking (ET), the ability to keep track of states, is a fundamental skill that underlies complex reasoning. An increasing amount of work investigates how transformer language models (LMs) solve entity binding $\textit{without}$ state changes. However, there is limited understanding of how non-toy LMs address ET problems of realistic difficulties expressed in natural language. To this end, we investigate the mechanisms underlying ET in more complex scenarios featuring multiple state-changing operations. We find that LMs do not incrementally track world states across tokens or query-relevant states across layers, but simply aggregate relevant information in parallel at the last token when the query becomes evident. We further investigate mechanisms of individual operations ($\texttt{PUT}$, $\texttt{REMOVE}$, $\texttt{MOVE}$) to characterize this non-incremental ET mechanism. Surprisingly, LMs implement the $\texttt{REMOVE}$ operation with a fragile global suppression tag; this global removal mechanism predicts various failure modes that we confirm behaviorally. We provide a mechanistic solution of nullifying this tag to partially address this issue. Overall, our findings reveal that LMs solve a fundamentally sequential task using a non-sequential strategy. More broadly, our work illustrates how behavioral and mechanistic analyses can fruitfully interact. Behavioral results inform mechanistic hypotheses, and insights from mechanistic analyses help build stronger behavioral evaluations by predicting failure modes missing from existing evaluations.
Abstract（参考訳）: 状態追跡機能であるエンティティトラッキング(ET)は、複雑な推論の基礎となる基本的なスキルである。トランスフォーマー言語モデル(LM)がエンティティバインディングを$\textit{without}$状態変化でどのように解決するかを調査する作業が増えている。しかし、自然言語で表現される現実的困難のET問題に対して、非目的のLMがどう対処するかについては、限定的な理解がある。そこで本研究では,複数の状態変化操作を特徴とする複雑なシナリオにおいて,ETの基盤となるメカニズムについて検討する。 LMは、各層にまたがるトークンやクエリ関連状態を段階的に追跡するのではなく、クエリが明らかになると、関連する情報を最後のトークンで並列に集約する。さらに、この非インクリメンタルETメカニズムを特徴付けるために、個別の操作(\texttt{PUT}$, $\texttt{REMOVE}$, $\texttt{MOVE}$)のメカニズムについても検討する。驚いたことに、LMは脆弱なグローバル抑制タグで$\texttt{REMOVE}$操作を実装しています。この問題に部分的に対処するために、このタグを無効にするためのメカニスティックなソリューションを提供する。以上の結果から,LMは非逐次的戦略を用いて,基本的な逐次的課題を解決していることが明らかとなった。より広い範囲で、我々の研究は、行動と機械的分析が実りよく相互作用する様子を描いている。行動結果は機械的仮説を示唆し、機械的分析からの洞察は、既存の評価から欠落した障害モードを予測することによって、より強力な行動評価を構築するのに役立つ。

論文の概要: Do Language Models Track Entities Across State Changes?

関連論文リスト