Fugu-MT 論文翻訳(概要): Design Principles for Sequence Models via Coefficient Dynamics

論文の概要: Design Principles for Sequence Models via Coefficient Dynamics

arxiv url: http://arxiv.org/abs/2510.09389v1
Date: Fri, 10 Oct 2025 13:42:31 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 00:38:49.189639
Title: Design Principles for Sequence Models via Coefficient Dynamics
Title（参考訳）: 係数ダイナミクスによるシーケンスモデルの設計原理
Authors: Jerome Sieber, Antonio Orvieto, Melanie N. Zeilinger, Carmen Amo Alonso,
Abstract要約: インパルス入力によって駆動される自律線形力学系の出力として線形結合係数をキャストすることにより、この出力演算を明示する統一的なフレームワークを開発する。この視点は、線形RNNと線形注意を結びつけることに焦点を当てたアプローチとは大きく異なり、多様なアーキテクチャにまたがる共通の数学的テーマを明らかにしている。これにより、表現性と効率的な実装のトレードオフ、入力選択性に関する幾何学的制約、数値的に安定したトレーニングと情報保持のための安定性条件を識別できる。
参考スコア（独自算出の注目度）: 20.14360019974826
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep sequence models, ranging from Transformers and State Space Models (SSMs) to more recent approaches such as gated linear RNNs, fundamentally compute outputs as linear combinations of past value vectors. To draw insights and systematically compare such architectures, we develop a unified framework that makes this output operation explicit, by casting the linear combination coefficients as the outputs of autonomous linear dynamical systems driven by impulse inputs. This viewpoint, in spirit substantially different from approaches focusing on connecting linear RNNs with linear attention, reveals a common mathematical theme across diverse architectures and crucially captures softmax attention, on top of RNNs, SSMs, and related models. In contrast to new model proposals that are commonly evaluated on benchmarks, we derive design principles linking architectural choices to model properties. Thereby identifying tradeoffs between expressivity and efficient implementation, geometric constraints on input selectivity, and stability conditions for numerically stable training and information retention. By connecting several insights and observations from recent literature, the framework both explains empirical successes of recent designs and provides guiding principles for systematically designing new sequence model architectures.
Abstract（参考訳）: 変換器と状態空間モデル(SSM)から、ゲート線形RNNのようなより最近のアプローチまで、ディープシーケンスモデルは、過去の値ベクトルの線形結合として出力を根本的に計算する。インパルス入力によって駆動される自律線形力学系の出力として線形結合係数をキャストすることにより、洞察を導き、これらのアーキテクチャを体系的に比較する。この視点は、線形RNNと線形アテンションを結びつけることに焦点を当てたアプローチとは大きく異なり、様々なアーキテクチャにまたがる共通の数学的テーマを明らかにし、RNN、SSM、および関連するモデル上でソフトマックスアテンションを決定的に捉えている。ベンチマークで一般的に評価される新しいモデル提案とは対照的に、アーキテクチャの選択とモデルプロパティを結びつける設計原則を導出します。これにより、表現性と効率的な実装のトレードオフ、入力選択性に関する幾何学的制約、数値的に安定したトレーニングと情報保持のための安定性条件を識別できる。このフレームワークは、最近の文献からいくつかの洞察と観察を結びつけることで、どちらも最近の設計の実証的な成功を説明し、新しいシーケンスモデルアーキテクチャを体系的に設計するための指針を提供する。

論文の概要: Design Principles for Sequence Models via Coefficient Dynamics

関連論文リスト