Fugu-MT 論文翻訳(概要): Anti Mode-Collapse in Mean-Field Transformer via Auxiliary Variables

論文の概要: Anti Mode-Collapse in Mean-Field Transformer via Auxiliary Variables

arxiv url: http://arxiv.org/abs/2605.30229v1
Date: Thu, 28 May 2026 16:59:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-30 02:45:56.564375
Title: Anti Mode-Collapse in Mean-Field Transformer via Auxiliary Variables
Title（参考訳）: 補助変圧器を用いた平均場変圧器の反モード崩壊
Authors: Masaaki Imaizumi, Masanori Koyama, Noboru Isobe, Kohei Hayashi,
Abstract要約: 平均場に基づく変圧器モデルを用いて、位置符号化などの補助変数が自己認識機構のモード崩壊を防ぐ方法について理論的に検討する。
参考スコア（独自算出の注目度）: 9.32548799357705
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We use a mean-field-based transformer model to theoretically investigate how auxiliary variables, such as positional encoding, prevent mode collapse of self-attention mechanisms. The use of mean-field transformers to analyze the properties of self-attention mechanisms has garnered significant attention in recent years due to their ability to comprehensively analyze token interactions. However, analysis of this simple model suggests that mode collapse, where token distributions degenerate to a single point, occurs during long inferences (i.e., many layers), indicating a discrepancy with reality. This study investigates this mean-field transformer model and demonstrates that the introduction of auxiliary variables, such as positional encoding, acts as a counterforce against theoretical mode collapse. Specifically, we show that in the theoretical scheme, the energy-maximizing distribution does not degenerate to a single point; instead, it is characterized by a pushforward of the auxiliary variable distribution, thereby avoiding concentration in the Dirac measure. Our main examples are the positional encoding and the fixed prompt insertion treated as a parallel auxiliary-variable mechanism. Furthermore, we demonstrate that positional encoding and prompt insertion possess universality of representation in the limit, meaning that the limit distribution of inference can exactly represent a wide class of distributions. We also analyze several key properties of positional encoding and metastability, and validate our theoretical results through mathematical experiments.
Abstract（参考訳）: 平均場に基づく変圧器モデルを用いて、位置符号化などの補助変数が自己認識機構のモード崩壊を防ぐ方法について理論的に検討する。近年,トークンの相互作用を包括的に解析する能力から,平均場変換器による自己保持機構の特性解析が注目されている。しかし、この単純なモデルの解析は、トークン分布が1つの点に縮退するモード崩壊は長い推論(つまり多くの層)の間に起こり、現実との相違を示すことを示唆している。本研究では, この平均場変圧器モデルについて検討し, 位置符号化などの補助変数の導入が, 理論モード崩壊に対する反作用として働くことを示す。具体的には, エネルギー最大化分布は単一点に縮退せず, 代わりに, 補助変数分布の前進によって特徴づけられ, ディラック測度の集中を避けることができる。我々の主な例は、位置符号化と固定プロンプト挿入を並列補助可変機構として扱うことである。さらに、位置符号化とプロンプト挿入は、その極限における表現の普遍性を持ち、つまり推論の極限分布は、正確には幅広い分布のクラスを表すことができることを示す。また,位置エンコーディングとメタスタビリティのいくつかの重要な特性を解析し,数学的実験により理論的結果を検証する。

関連論文リスト

Kinetic theory for Transformers and the lost-in-the-middle phenomenon [1.0705399532413615]
我々は、デコーダトランスフォーマーのおもちゃモデルである因果自己注意ダイナミクスについて研究する。一様分散トークンの場合、制限相関方程式は閉じた形で解ける。
論文参考訳（メタデータ） (2026-05-09T23:16:19Z)
TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors [53.891337639229285]
高次アテンション・インタラクション接続を通して表現された入力依存線形演算子として変換器全体をキャプチャする新しい定式化である attentionLens を導入する。本実験は,注目テンソルが,解釈可能性とモデル理解を目的としたツール開発のための強力な基盤となることを実証した。
論文参考訳（メタデータ） (2026-01-25T19:21:25Z)
Random-Matrix-Induced Simplicity Bias in Over-parameterized Variational Quantum Circuits [72.0643009153473]
本稿では,観測可能な期待値とパラメータ勾配の両方がシステムサイズに指数関数的に集中するHaar型普遍性クラスに,表現的変分アンサーゼが入ることを示す。その結果、そのような回路によって誘導される仮説クラスは、近点関数の狭い族に高い確率で崩壊する。テンソル-ネットワークベースおよびテンソル-ハイパーネットワークパラメータ化を含むテンソル構造VQCは、ハール型普遍性クラスの外にある。
論文参考訳（メタデータ） (2026-01-05T08:04:33Z)
Multivariate Bernoulli Hoeffding Decomposition: From Theory to Sensitivity Analysis [2.762021507766656]
この研究はベルヌーイの入力の場合に焦点を当て、分解の完全な解析的特性を提供する。この離散的な設定では、関連する部分空間は一次元であり、分解が閉形式表現を持つことを示す。本稿は,方法論を高次元設定に拡張し,有限な非バイナリサポートを持つ入力を含むモデルに拡張する,という視点で締めくくっている。
論文参考訳（メタデータ） (2025-10-08T14:46:20Z)
Transformers Learn Faster with Semantic Focus [57.97235825738412]
学習性と一般化の観点からスパース変圧器について検討する。入力依存のスパースアテンションモデルは、標準アテンションモデルよりも早く収束し、より一般化しているように見える。
論文参考訳（メタデータ） (2025-06-17T01:19:28Z)
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models [64.87562101662952]
入力トークンは、位置エンコーディングを含むため、しばしば交換可能であることを示す。我々は入力トークンの十分かつ最小限の表現の存在を確立する。所望パラメータの注意が近似誤差まで潜伏した後部を推定することを証明する。
論文参考訳（メタデータ） (2022-12-30T17:59:01Z)
Covariate Shift in High-Dimensional Random Feature Regression [44.13449065077103]
共変量シフトは、堅牢な機械学習モデルの開発において重要な障害である。現代の機械学習の文脈における理論的理解を示す。
論文参考訳（メタデータ） (2021-11-16T05:23:28Z)
Few-shot Domain Adaptation by Causal Mechanism Transfer [107.08605582020866]
我々は,少数のラベル付き対象ドメインデータと多数のラベル付きソースドメインデータしか利用できないレグレッション問題に対して,数ショットの教師付きドメイン適応(DA)について検討する。現在のDA法の多くは、パラメータ化された分布シフトまたは明らかな分布類似性に基づく転送仮定に基づいている。本稿では,データ生成機構がドメイン間で不変であるメタ分散シナリオであるメカニズム転送を提案する。
論文参考訳（メタデータ） (2020-02-10T02:16:53Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。