Fugu-MT 論文翻訳(概要): Geometric Factual Recall in Transformers

論文の概要: Geometric Factual Recall in Transformers

arxiv url: http://arxiv.org/abs/2605.12426v1
Date: Tue, 12 May 2026 17:22:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-13 21:48:57.047661
Title: Geometric Factual Recall in Transformers
Title（参考訳）: 変圧器における幾何学的ファクチュアルリコール
Authors: Shauli Ravfogel, Gilad Yehudai, Joan Bruna, Alberto Bietti,
Abstract要約: 一般的な見解では、内部の重み行列は埋め込みのペアに対する連想記憶として捉えられ、事実の数と線形にスケールする記憶数を必要とする。我々は、学習した埋め込みが直接構造を符号化する、別のエンフェロメトリックな形態の記憶の理論的、実証的な説明を開発する。単層変圧器が被写体から共有属性集合へのランダムベクトルを記憶しなければならない制御環境では、対数埋め込み次元が十分であることを示す。これらの結果をマルチホップ設定に拡張し、証明可能なチェーン・オブ・シークレットによる構築を提供する。
参考スコア（独自算出の注目度）: 57.48371649045765
License: http://creativecommons.org/licenses/by/4.0/
Abstract: How do transformer language models memorize factual associations? A common view casts internal weight matrices as associative memories over pairs of embeddings, requiring parameter counts that scale linearly with the number of facts. We develop a theoretical and empirical account of an alternative, \emph{geometric} form of memorization in which learned embeddings encode relational structure directly, and the MLP plays a qualitatively different role. In a controlled setting where a single-layer transformer must memorize random bijections from subjects to a shared attribute set, we prove that a logarithmic embedding dimension suffices: subject embeddings encode \emph{linear superpositions} of their associated attribute vectors, and a small MLP acts as a relation-conditioned selector that extracts the relevant attribute via ReLU gating, and not as an associative key-value mapping. We extend these results to the multi-hop setting -- chains of relational queries such as ``Who is the mother of the wife of $x$?'' -- providing constructions with and without chain-of-thought that exhibit a provable capacity-depth tradeoff, complemented by a matching information-theoretic lower bound. Empirically, gradient descent discovers solutions with precisely the predicted structure. Once trained, the MLP transfers zero-shot to entirely new bijections when subject embeddings are appropriately re-initialized, revealing that it has learned a generic selection mechanism rather than memorized any particular set of facts.
Abstract（参考訳）: トランスフォーマー言語モデルはどのように事実関連を記憶しますか? 一般的な見解では、内部の重み行列は埋め込みのペアに対する連想記憶であり、事実の数と線形にスケールするパラメータ数を必要とする。我々は,学習した埋め込みが直接関係構造を符号化し,MDPが定性的に異なる役割を担っている記憶の代替として,emph{geometric}形式を理論的かつ実証的に開発する。単層変換器が被写体から共有属性集合へのランダムな単射を記憶しなければならない制御環境では、対象埋め込みがそれらの属性ベクトルのエンコード \emph{linear superpositions} を符号化し、小さな MLP が関係条件付きセレクタとして機能し、ReLU ゲーティングによって関連属性を抽出する。これらの結果をマルチホップ設定に拡張します -- ``Who is the mother of the wife of $x$?'' -- のような関係クエリのチェーンは、証明可能なキャパシティの深いトレードオフを示すチェーン・オブ・シントの構築を提供し、一致する情報理論の下限で補完します。経験的に、勾配降下は正確に予測された構造を持つ解を発見する。訓練後、MLPは、対象の埋め込みが適切に再初期化されると、ゼロショットを全く新しい単射に転送し、特定の事実を記憶するのではなく、一般的な選択メカニズムを学んだことを明らかにした。

論文の概要: Geometric Factual Recall in Transformers

関連論文リスト