Fugu-MT 論文翻訳(概要): Attention as Frustrated Synchronization

論文の概要: Attention as Frustrated Synchronization

arxiv url: http://arxiv.org/abs/2606.18694v1
Date: Wed, 17 Jun 2026 05:18:01 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-18 17:16:51.013931
Title: Attention as Frustrated Synchronization
Title（参考訳）: フラストレーション同期としての注意
Authors: Joshua Nunley,
Abstract要約: トークン状態がトーラス上の位相であり、全値経路がハーモニクス上の1つの学習された複素結合カーネルであり、1ステップ遅れであるフラストレート同期ネットワーク(FSN)を紹介する。複雑な位相は静的な倉本坂口フラストレーション角、符号付きハーモニクスは反発する大道成分であり、各トークンをそれに付随するトークンの後継者に結合させる遅延項は、倉本坂口結合と代数的に同じである。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A network of oscillators that synchronizes perfectly computes nothing further, so an attention architecture built from synchronization must locate its computation in structured departures from agreement. We introduce the Frustrated Synchronization Network (FSN), whose token states are phases on a torus and whose entire value pathway is one learned complex coupling kernel over harmonics and a one-step delay. Each component of the kernel is a frustration in the sense of the synchronization literature. The complex phases are static Kuramoto-Sakaguchi frustration angles, the signed harmonics are repulsive Daido components, and the delay term, which couples each token to the successors of the tokens it attends to, is algebraically identical to Kuramoto-Sakaguchi coupling whose frustration angle is the data's own transition, so next-token prediction is implemented as synchronization frustrated by the data. At matched one-million-parameter and training budgets on character-level text and code, the FSN's validation loss is below a tuned RoPE-SwiGLU transformer's at every epoch measured, and the comparison survives training the baseline to convergence: every thirty-epoch enwik8 seed finishes below the transformer's converged fifty-epoch loss of 1.611, and the FSN's completed fifty-epoch runs converge to 1.5953 +/- 0.0014. A variant with every feed-forward block replaced by mean-field coupling to learned collective modes, leaving no multilayer perceptron in the stack, tracks the transformer. On natural text the unfrustrated base layer falls behind the converged transformer at every copy depth, worst on long-range copy events; the kernel reverses the deficit at every depth of four and beyond. Headline comparisons are at the one-million-parameter scale; a scale ladder is complete through four million parameters with the advantage persisting, and remaining arms are marked as in progress.
Abstract（参考訳）: 完全に同期する発振器のネットワークは、これ以上何も計算しないので、同期から構築された注意アーキテクチャは、合意から構造化された離脱においてその計算を見つける必要がある。トークン状態がトーラス上の位相であり、全値経路がハーモニクス上の1つの学習された複素結合カーネルであり、1ステップ遅れであるフラストレート同期ネットワーク(FSN)を紹介する。カーネルの各コンポーネントは同期文学におけるフラストレーションである。複雑な位相は静的な倉本坂口フラストレーション角、符号付きハーモニクスは反発する大道成分であり、各トークンをそのトークンの後継と結合する遅延項は、データ自身の遷移である倉本坂口カップリングと代数的に同一であり、次トーケン予測はデータによってフラストレーションされる同期として実装される。一致した100万パラメータと文字レベルのテキストとコードのトレーニング予算では、FSNのバリデーション損失は、測定されたすべてのエポックで調整されたRoPE-SwiGLU変換器以下であり、比較はベースラインを収束させる訓練を継続する: 30エポックのenwik8シードは、変換器の収束した50エポックの損失1.611以下で終了し、FSNの完了した50エポックは1.5953 +/- 0.0014に収束する。全てのフィードフォワードブロックを持つ変種は、学習された集合モードに平均場結合に置き換えられ、スタックに多層パーセプトロンを残さず、トランスフォーマーを追跡する。自然なテキストでは、非フラストレーションベースの層は、コピー深度ごとに収束したトランスフォーマーの後方に落ち、長距離コピーイベントでは最悪である。ヘッドライン比較は100万パラメートルのスケールで行われ、スケールのはしごは400万のパラメータで完結し、利点は持続し、残りアームは進行中である。

論文の概要: Attention as Frustrated Synchronization

関連論文リスト