Fugu-MT 論文翻訳(概要): Kuramoto Attention: Synchronizing Self-Attention on the Torus

論文の概要: Kuramoto Attention: Synchronizing Self-Attention on the Torus

arxiv url: http://arxiv.org/abs/2606.11585v1
Date: Wed, 10 Jun 2026 02:24:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-11 16:42:38.255646
Title: Kuramoto Attention: Synchronizing Self-Attention on the Torus
Title（参考訳）: 倉本留置所:トーラスに自己注意を同期させる
Authors: Joshua Nunley,
Abstract要約: 隠れた座標が角である自己注意層である倉本注意を導入する。層は、ゲートコサイン類似性によりトークンをスコアし、前のフェーズ状態に参画し、注目重み付き円平均の接点成分によって各トークンを更新する。 enwiki8文字レベルの言語モデリングでは、レイヤは関数型言語モデルとして訓練される。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce Kuramoto attention, a self-attention layer in which each hidden coordinate is an angle. The layer scores tokens by gated cosine similarity, attends over previous phase states, and updates each token by the tangent component of the attention-weighted circular mean. Because the values are the raw phase states, this update is exactly the Kuramoto coupling term $\sum_u A_{t,u}\sin(θ_u-θ_t)$, with the attention matrix acting as an adaptive, content-dependent coupling kernel. Equivalently, the gated score is a learned metric on the torus that selects which tokens couple, and the update pulls each token toward the circular mean of the tokens it selects, tightening their phase agreement. The same two ingredients, an invariant similarity score and an on-manifold mean, define such a layer on any compact group; the torus is the abelian case, where both are closed-form. The softmax weights solve an entropy-regularized phase-retrieval problem, and rotary position enters as a position-dependent phase drift in the score. On enwiki8 character-level language modeling, the layer trains as a functional language model whose bits-per-character stays close to a strong matched RoPE+SwiGLU transformer: within $0.02$ BPC at one million parameters ($1.637\pm0.010$ versus $1.616\pm0.004$) and level on the median at five million ($1.448$ versus $1.452$ over five seeds) with the transformer ahead on the mean ($1.468$ versus $1.456$). These experiments establish that the constrained geometric structure is a viable language model at this scale; the structure itself, and its synchronization reading, is the contribution. Ablations isolate the load-bearing components, and the result gives a compact bridge between self-attention and phase synchronization.
Abstract（参考訳）: 隠れた座標が角である自己注意層である倉本注意を導入する。層は、ゲートコサイン類似性によりトークンをスコアし、前のフェーズ状態に参画し、注目重み付き円平均の接点成分によって各トークンを更新する。値が生の位相状態であるため、この更新はまさに倉本結合項$\sum_u A_{t,u}\sin(θ_u-θ_t)$であり、アテンション行列は適応的でコンテンツ依存の結合カーネルとして機能する。同様に、ゲートスコアはトーラス上で学習されたメトリクスであり、どのトークンがどのトークンを結合するかを選択し、更新はそれぞれのトークンを選択したトークンの円平均に向かって引き寄せ、フェーズアグリーメントを締め付ける。同じ2つの成分、不変類似度スコア(invariant similarity score)とオンマンフォールド(on-manifold mean)は任意のコンパクト群上のそのような層を定義する。ソフトマックス重みはエントロピー規則化位相検索問題を解き、回転位置はスコアにおける位置依存位相ドリフトとして入ってくる。 enwiki8文字レベルの言語モデリングでは、層は機能言語モデルとして訓練され、ビット・パー・キャラクタは強いマッチするRoPE+SwiGLU変換器に近づき、100万のパラメータで0.02$ BPC(1.637\pm0.010$対1.616\pm0.004$)、中央値で500万(1.448$対1.452$ over 5 seed)、トランスフォーマーが平均で1.468$対1.456$)である。これらの実験は、制約付き幾何構造がこのスケールで実行可能な言語モデルであることを証明している。アブレーションは負荷を持つコンポーネントを分離し、その結果は自己アテンションとフェーズ同期の間にコンパクトなブリッジを与える。

関連論文リスト

Wavelet Variance Equipartition as a Threshold for World-Model Quality and Quantum Kernel TN-Simulability [0.0]
我々はウェーブレットスケーリング指数$$を臨界診断として同定する。振幅符号化された量子カーネルの古典的シミュラビリティのシャープな遷移境界として$=1/2$を確立する。この分散は、厳密には$Var[X] = (d-2)$としてスケールする。
論文参考訳（メタデータ） (2026-05-12T05:41:12Z)
The Geometry of Forgetting: Temporal Knowledge Drift as an Independent Axis in LLM Representations [50.43168858368539]
大規模言語モデルは自信を持って時代遅れの回答を生成し、既存の方法では検出できない。これは工学的な失敗ではなく構造的な失敗であり、時間的ドリフトは、幾何的に残留流の方向として、正確性と不確実性の両方に符号化される。
論文参考訳（メタデータ） (2026-05-09T22:27:31Z)
Mean-Pooled Cosine Similarity is Not Length-Invariant: Theory and Cross-Domain Evidence for a Length-Invariant Alternative [1.5718921092089344]
平均プールされたコサイン類似性は、言語、モダリティ、タスク間の神経表現を比較するためのデフォルトの指標である。現代の変圧器表現を特徴づける異方性の下では、平均プールされたコサインは配列長で単調に成長する。我々は、Centered Kernel Alignmentのような長さ不変のメトリクスは、クロス表現比較のデフォルトであるべきだと論じる。
論文参考訳（メタデータ） (2026-05-08T06:48:34Z)
Coupled Query-Key Dynamics for Attention [6.775853253396773]
スコア付けする前に、共有された学習力学を通して進化するクエリとキーを瞬時に示す。 60MパラメータのWikiText-103では、結合力学は22.55--22.62パープレキシティ(英語版)と24.22のパープレキシティ(英語版)を達成する。
論文参考訳（メタデータ） (2026-04-02T06:37:05Z)
Token Sample Complexity of Attention [20.89022639697809]
我々は、$n$トークンで計算された注意が無限のトーケン限界に収束する速度を推定する。コンパクトに支持された分布に対して、最初の結果は、注意写像が半径$R$の球に一様収束することを示す。また,注意パラメータが無限大に近づき,ソフトマックスがハードマックスに近づく傾向についても検討する。
論文参考訳（メタデータ） (2025-12-11T14:02:34Z)
Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers [54.20763128054692]
我々は,2層変換器が$n$-gramのマルコフ連鎖データ上でICLを実行するためにどのように訓練されているかを検討する。クロスエントロピー ICL 損失に対する勾配流が極限モデルに収束することを証明する。
論文参考訳（メタデータ） (2024-09-09T18:10:26Z)
Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown Transition [71.33787410075577]
線形関数近似,未知遷移,および逆損失を用いた強化学習について検討した。我々は高い確率で$widetildeO(dsqrtHS3K + sqrtHSAK)$ regretを実現する新しいアルゴリズムを提案する。
論文参考訳（メタデータ） (2024-03-07T15:03:50Z)
Measurement-induced phase transition for free fermions above one dimension [46.176861415532095]
自由フェルミオンモデルに対する$d>1$次元における測定誘起エンタングルメント相転移の理論を開発した。臨界点は、粒子数と絡み合いエントロピーの第2累積のスケーリング$$elld-1 ln ell$でギャップのない位相を分離する。
論文参考訳（メタデータ） (2023-09-21T18:11:04Z)
Dynamical Signatures of Chaos to Integrability Crossover in $2\times 2$ Generalized Random Matrix Ensembles [0.0]
NNS(Nearest Neighbor Spacing)の密度と2次モーメントの計算によるエネルギー相関について検討する。 NNSの2番目のモーメントである大きな$N$と相対的な相関穴の深さは、$gamma=2$で2次相転移を示す。
論文参考訳（メタデータ） (2020-10-28T05:02:13Z)
$O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers [71.31712741938837]
注意層ごとに$O(n)$接続しか持たないスパース変換器は、$n2$接続を持つ高密度モデルと同じ関数クラスを近似できることを示す。また、標準NLPタスクにおいて、異なるパターン・レベルの違いを比較検討する。
論文参考訳（メタデータ） (2020-06-08T18:30:12Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。