Fugu-MT 論文翻訳(概要): Theoretical Constraints on the Expressive Power of $\mathsf{RoPE}$-based Tensor Attention Transformers

論文の概要: Theoretical Constraints on the Expressive Power of $\mathsf{RoPE}$-based Tensor Attention Transformers

arxiv url: http://arxiv.org/abs/2412.18040v1
Date: Mon, 23 Dec 2024 23:26:07 GMT
ステータス: 翻訳完了
システム内更新日: 2024-12-25 19:23:17.596689
Title: Theoretical Constraints on the Expressive Power of $\mathsf{RoPE}$-based Tensor Attention Transformers
Title（参考訳）: $\mathsf{RoPE}$-based Tensor Attention Transformers の表現力に関する理論的制約
Authors: Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song, Mingda Wan,
Abstract要約: 本研究では, アテンションと$mathsfRoPE$-based Attentionの回路複雑性を分析し, 固定メンバシップ問題や$(A_F,r)*$クロージャ問題を解くことができないことを示す。これらの結果は,経験的性能と注意の理論的制約と$mathsfRoPE$ベースの注意変換器とのギャップを浮き彫りにした。
参考スコア（独自算出の注目度）: 23.991344681741058
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Tensor Attention extends traditional attention mechanisms by capturing high-order correlations across multiple modalities, addressing the limitations of classical matrix-based attention. Meanwhile, Rotary Position Embedding ($\mathsf{RoPE}$) has shown superior performance in encoding positional information in long-context scenarios, significantly enhancing transformer models' expressiveness. Despite these empirical successes, the theoretical limitations of these technologies remain underexplored. In this study, we analyze the circuit complexity of Tensor Attention and $\mathsf{RoPE}$-based Tensor Attention, showing that with polynomial precision, constant-depth layers, and linear or sublinear hidden dimension, they cannot solve fixed membership problems or $(A_{F,r})^*$ closure problems, under the assumption that $\mathsf{TC}^0 \neq \mathsf{NC}^1$. These findings highlight a gap between the empirical performance and theoretical constraints of Tensor Attention and $\mathsf{RoPE}$-based Tensor Attention Transformers, offering insights that could guide the development of more theoretically grounded approaches to Transformer model design and scaling.
Abstract（参考訳）: テンソル注意(Tensor Attention)は、複数のモードにわたる高次相関を捉え、古典的行列に基づく注意の制限に対処することによって、伝統的な注意機構を拡張する。一方、ロータリー位置埋め込み($\mathsf{RoPE}$)は、長文シナリオにおける位置情報の符号化において優れた性能を示し、トランスフォーマーモデルの表現性を著しく向上させた。これらの実証的な成功にもかかわらず、これらの技術の理論的な限界は未解明のままである。本研究では, テンソルアテンションと$\mathsf{RoPE}$-ベーステンソルアテンションの回路複雑性を解析し, 多項式精度, 定数深度層, 線形あるいは線形の隠れ次元では, 固定メンバシップ問題や$(A_{F,r})^*$クロージャ問題を, $\mathsf{TC}^0 \neq \mathsf{NC}^1$と仮定して解くことができないことを示す。これらの結果は、テンソル・アテンションと$\mathsf{RoPE}$ベースのテンソル・アテンション・トランスフォーマーの実験的性能と理論的制約のギャップを浮き彫りにした。

関連論文リスト

Born a Transformer -- Always a Transformer? [57.37263095476691]
We study a family of $textitretrieval$ and $textitcopying$ tasks inspired by Liu et al。我々は、事前訓練されたモデルがクエリトークンの左(アンチインダクション)よりも右(インダクション)へのトークンの検索が優れているような、$textitinduction-versus-anti-induction$ asymmetricを観察する。力学解析により、この非対称性は、事前学習された変圧器内の誘導の強度と反誘導回路の強度の違いに関係していることが明らかになった。
論文参考訳（メタデータ） (2025-05-27T21:36:50Z)
Tensor Convolutional Network for Higher-Order Interaction Prediction in Sparse Tensors [74.31355755781343]
我々は,トップk相互作用を予測するTF法とシームレスに統合する,正確で互換性のあるテンソル畳み込みネットワークTCNを提案する。 TF法と統合されたTNは,TF法やハイパーエッジ予測法などの競合よりも優れていることを示す。
論文参考訳（メタデータ） (2025-03-14T18:22:20Z)
TensorGRaD: Tensor Gradient Robust Decomposition for Memory-Efficient Neural Operator Training [91.8932638236073]
textbfTensorGRaDは,重み付けに伴うメモリ問題に直接対処する新しい手法である。 SparseGRaD は総メモリ使用量を 50% 以上削減し,同時に精度も向上することを示した。
論文参考訳（メタデータ） (2025-01-04T20:51:51Z)
Circuit Complexity Bounds for RoPE-based Transformer Architecture [25.2590541420499]
経験的証拠は、$mathsfRoPE$ベースのTransformerアーキテクチャは、従来のTransformerモデルよりも優れた一般化能力を示していることを示している。我々は、$mathsfTC0 = mathsfNC1$, a $mathsfRoPE$-based Transformer with $mathrmpoly(n)$-precision, $O(1)$ layer, hidden dimension $d leq O(n)$が算術式評価問題を解くことができないことを示す。
論文参考訳（メタデータ） (2024-11-12T07:24:41Z)
On the Role of Depth and Looping for In-Context Learning with Task Diversity [69.4145579827826]
多様なタスクを伴う線形回帰のための文脈内学習について検討する。 We show that multilayer Transformer is not robust to even distributional shifts as $O(e-L)$ in Wasserstein distance。
論文参考訳（メタデータ） (2024-10-29T03:27:56Z)
Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers [54.20763128054692]
我々は,2層変換器が$n$-gramのマルコフ連鎖データ上でICLを実行するためにどのように訓練されているかを検討する。クロスエントロピー ICL 損失に対する勾配流が極限モデルに収束することを証明する。
論文参考訳（メタデータ） (2024-09-09T18:10:26Z)
Aligning Transformers with Weisfeiler-Leman [5.0452971570315235]
グラフニューラルネットワークアーキテクチャは、理論的によく理解された表現力を提供する$k$-WL階層と一致している。我々は,ラプラシアンPEやSPEなどの確立した位置符号化の研究を可能にする理論的枠組みを開発する。我々は,大規模PCQM4Mv2データセットを用いてトランスフォーマーの評価を行い,最先端のPCQM4Mv2と競合する予測性能を示した。
論文参考訳（メタデータ） (2024-06-05T11:06:33Z)
Tensor Attention Training: Provably Efficient Learning of Higher-order Transformers [18.331374727331077]
テンソルアテンションの時間的複雑さは、変圧器におけるその利用にとって重要な障害である。注意訓練の後方勾配をほぼ線形時間で計算できることを実証する。
論文参考訳（メタデータ） (2024-05-26T02:59:13Z)
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers [14.59741397670484]
本稿では,トランスフォーマーと密接な関係を持つ深層多頭部自己注意ネットワークについて考察する。このモデルでベイズ学習の統計力学理論を開発する。合成および実世界のシーケンス分類タスクについて,本研究の成果を確認した。
論文参考訳（メタデータ） (2024-05-24T20:34:18Z)
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems [57.58801785642868]
思考の連鎖(CoT)は、算術や記号的推論タスクにおいて、大きな言語モデル(LLM)の精度を向上させるための非常に効果的な方法である。この研究は、表現性のレンズを通してデコーダのみの変換器に対するCoTのパワーを理論的に理解する。
論文参考訳（メタデータ） (2024-02-20T10:11:03Z)
On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
我々は、閉形式力学を解析するための数学的機会を提供する、簡潔な損失関数であるアンヒンジド・ロスを導入する。アンヒンジされた損失は、時間変化学習率や特徴正規化など、より実践的なテクニックを検討することができる。
論文参考訳（メタデータ） (2023-12-13T02:11:07Z)
On the Convergence of Encoder-only Shallow Transformers [62.639819460956176]
エンコーダのみの浅部変圧器のグローバル収束理論を現実的な条件下で構築する。我々の結果は、現代のトランスフォーマー、特にトレーニング力学の理解を深める道を開くことができる。
論文参考訳（メタデータ） (2023-11-02T20:03:05Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。