Fugu-MT 論文翻訳(概要): Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context

論文の概要: Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context

arxiv url: http://arxiv.org/abs/2510.06182v1
Date: Tue, 07 Oct 2025 17:44:30 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-08 17:57:08.39681
Title: Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context
Title（参考訳）: 混合メカニズム:言語モデルがコンテキスト内の境界エンティティを検索する方法
Authors: Yoav Gur-Arieh, Mor Geva, Atticus Geiger,
Abstract要約: インコンテキスト推論の鍵となるコンポーネントは、言語モデル(LM)が後続の検索のためにエンティティをバインドする能力である。この機構は, より複雑な設定に悪影響を及ぼすことを示す。我々は,次のトークン分布を95%の一致で推定する3つのメカニズムをすべて組み合わせた因果モデルを構築した。
参考スコア（独自算出の注目度）: 33.223631694438
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A key component of in-context reasoning is the ability of language models (LMs) to bind entities for later retrieval. For example, an LM might represent "Ann loves pie" by binding "Ann" to "pie", allowing it to later retrieve "Ann" when asked "Who loves pie?" Prior research on short lists of bound entities found strong evidence that LMs implement such retrieval via a positional mechanism, where "Ann" is retrieved based on its position in context. In this work, we find that this mechanism generalizes poorly to more complex settings; as the number of bound entities in context increases, the positional mechanism becomes noisy and unreliable in middle positions. To compensate for this, we find that LMs supplement the positional mechanism with a lexical mechanism (retrieving "Ann" using its bound counterpart "pie") and a reflexive mechanism (retrieving "Ann" through a direct pointer). Through extensive experiments on nine models and ten binding tasks, we uncover a consistent pattern in how LMs mix these mechanisms to drive model behavior. We leverage these insights to develop a causal model combining all three mechanisms that estimates next token distributions with 95% agreement. Finally, we show that our model generalizes to substantially longer inputs of open-ended text interleaved with entity groups, further demonstrating the robustness of our findings in more natural settings. Overall, our study establishes a more complete picture of how LMs bind and retrieve entities in-context.
Abstract（参考訳）: インコンテキスト推論の鍵となるコンポーネントは、言語モデル(LM)が後続の検索のためにエンティティをバインドする能力である。例えば、LMは"Ann loves pie"を"Ann"から"pie"に結合することで"Ann loves pie"を表すかもしれない。境界要素の短いリストに関する以前の研究では、LMが位置決め機構を通じてそのような検索を実装しているという強い証拠が見つかり、そこでは文脈における位置に基づいて「Ann」が検索される。この研究で、このメカニズムはより複雑な設定に一般化されることがわかり、コンテキストにおける有界な実体の数が増加するにつれて、位置決め機構は中位ではノイズになり、信頼性が低下する。これを補うために、LMは位置決め機構を語彙的機構(境界となる"pie"を使って"Ann"を取り出す)と反射機構(直接ポインタを通して"Ann"を取り出す)で補う。 9つのモデルと10のバインディングタスクに関する広範な実験を通じて、LMがモデル動作を駆動するためにこれらのメカニズムをどのように混合するかという一貫したパターンを明らかにする。これらの知見を利用して、次のトークン分布を95%の一致で推定する3つのメカニズムを結合した因果モデルを構築する。最後に,本モデルにより,エンティティグループにインターリーブされたオープンエンドテキストの入力が大幅に長くなることを示し,さらに自然条件下での発見の堅牢性を示す。全体として、本研究では、LMがコンテキスト内でエンティティを結合し、取得する方法について、より完全な図式を確立している。

論文の概要: Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context

関連論文リスト