Fugu-MT 論文翻訳(概要): Stochastic Attention via Langevin Dynamics on the Modern Hopfield Energy

論文の概要: Stochastic Attention via Langevin Dynamics on the Modern Hopfield Energy

arxiv url: http://arxiv.org/abs/2603.06875v2
Date: Tue, 10 Mar 2026 23:55:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-12 14:12:44.053943
Title: Stochastic Attention via Langevin Dynamics on the Modern Hopfield Energy
Title（参考訳）: 現代ホップフィールドエネルギーにおけるランゲヴィンダイナミクスによる確率的注意
Authors: Abdulrahman Alswaidan, Jeffrey D. Varner,
Abstract要約: 本研究では,Langevinサンプリングが注目され,単一温度で制御される無トレーニングサンプルが注目されることを示す。エネルギー勾配はアテンションマップと等しいため、スコアネットワーク、トレーニングループ、学習モデルを必要としない。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Attention heads retrieve: given a query, they return a softmax-weighted average of stored values. We show that this computation is one step of gradient descent on a classical energy function, and that Langevin sampling from the corresponding distribution yields stochastic attention: a training-free sampler controlled by a single temperature. Lowering the temperature gives exact retrieval; raising it gives open-ended generation. Because the energy gradient equals the attention map, no score network, training loop, or learned model is required. We derive a closed-form entropy inflection condition that identifies the retrieval-to-generation transition temperature for any memory geometry, with a scaling law $β^*\!\sim\!\sqrt{d}$ for random patterns. We validate on five domains (64 to 4,096 dimensions). On MNIST digit images, stochastic attention is $2.6{\times}$ more novel and $2.0{\times}$ more diverse than the best learned baseline (a VAE trained on the same patterns), while matching a Metropolis-corrected gold standard. On protein sequences from the Pfam RRM family, the generation regime achieves $6.9{\times}$ lower amino acid composition divergence than the VAE (KL $= 0.060$ vs.\ $0.416$) at matched novelty, demonstrating that the training-free score function preserves family-level fidelity that learned models lose. A denoising diffusion baseline (DDPM) fails across all memory sizes tested ($K = 100$ to $3{,}500$), producing samples indistinguishable from isotropic noise. The approach requires no architectural changes to the underlying attention mechanism.
Abstract（参考訳）: アテンションヘッド検索:クエリが与えられたら、保存された値の平均をソフトマックスで重み付けした値を返す。この計算は,古典的エネルギー関数の勾配降下の一段階であり,対応する分布からのランゲヴィンサンプリングは,単一温度で制御される学習自由サンプリング器である確率的注意を与えることを示す。温度を下げることによって正確な検索が可能になる。エネルギー勾配はアテンションマップと等しいため、スコアネットワーク、トレーニングループ、学習モデルを必要としない。我々は、任意のメモリ幾何学における検索から世代間遷移温度を特定する閉形式エントロピーインフレクション条件を、スケーリング法則$β^*\!で導出する。ようこそ! ランダムパターンに対して \sqrt{d}$。 5つの領域(64次元から4,096次元)で検証する。 MNISTのディジット画像では、確率的注意力は2.6{\times}$より新しいもので、2.0{\times}$は最高の学習ベースライン(同じパターンで訓練されたVAE)よりも多様である。 Pfam RRMファミリーのタンパク質配列について、生成機構は、VAE(KL $=0.060$ vs.)よりも低いアミノ酸組成の分岐を達成する。トレーニングなしスコア関数は、学習したモデルが失う家族レベルの忠実さを保っていることを示す。微分拡散ベースライン(DDPM)は、テストされた全てのメモリサイズ(K = 100$ to $3{,}500$)で失敗し、等方性ノイズと区別できないサンプルを生成する。このアプローチでは、基盤となる注意機構にアーキテクチャ的な変更は必要ありません。

論文の概要: Stochastic Attention via Langevin Dynamics on the Modern Hopfield Energy

関連論文リスト