Fugu-MT 論文翻訳(概要): A Comparative Analysis of Contextual Representation Flow in State-Space and Transformer Architectures

論文の概要: A Comparative Analysis of Contextual Representation Flow in State-Space and Transformer Architectures

arxiv url: http://arxiv.org/abs/2510.06640v1
Date: Wed, 08 Oct 2025 04:46:11 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-09 16:41:20.301074
Title: A Comparative Analysis of Contextual Representation Flow in State-Space and Transformer Architectures
Title（参考訳）: 状態空間と変圧器アーキテクチャにおける文脈表現フローの比較解析
Authors: Nhat M. Hoang, Do Xuan Long, Cong-Duy Nguyen, Min-Yen Kan, Luu Anh Tuan,
Abstract要約: 状態空間モデル(SSM)は、長いシーケンス処理のためにTransformer-Based Models(TBM)の効率的な代替品として登場した。本稿では,SSM と TBM における表現伝搬の統一・トークン・層レベルでの初めての解析について述べる。 TBMはトークン表現を急速に均質化し、多様性は後層のみに再燃し、SSMはトークンの特異性を早期に保存するが、より深い均質化に収束する。
参考スコア（独自算出の注目度）: 27.45316137669387
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: State Space Models (SSMs) have recently emerged as efficient alternatives to Transformer-Based Models (TBMs) for long-sequence processing, offering linear scaling and lower memory use. Yet, how contextual information flows across layers and tokens in these architectures remains understudied. We present the first unified, token- and layer-level analysis of representation propagation in SSMs and TBMs. Using centered kernel alignment, stability metrics, and probing, we characterize how representations evolve within and across layers. We find a key divergence: TBMs rapidly homogenize token representations, with diversity reemerging only in later layers, while SSMs preserve token uniqueness early but converge to homogenization deeper. Theoretical analysis and parameter randomization further reveal that oversmoothing in TBMs stems from architectural design, whereas in SSMs it arises mainly from training dynamics. These insights clarify the inductive biases of both architectures and inform future model and training designs for long-context reasoning.
Abstract（参考訳）: State Space Models (SSM) は、最近、長いシーケンス処理のためにTransformer-Based Models (TBM) に代わる効率的な代替品として登場し、線形スケーリングとメモリ使用量の削減を提供している。しかし、これらのアーキテクチャにおけるレイヤやトークン間のコンテキスト情報の流れは、まだ検討されていない。本稿では,SSM と TBM における表現伝搬の統一・トークン・層レベルでの初めての解析について述べる。中心となるカーネルアライメント、安定性メトリクス、プローブを使用して、レイヤ内および層間の表現の進化を特徴付ける。 TBMはトークン表現を急速に均質化し、多様性は後層の層にのみ再帰し、SSMはトークンの特異性を早期に保存するが、より深い均質化に収束する。理論的解析とパラメータのランダム化により、TBMの過度な平滑化はアーキテクチャ設計に起因するが、SSMでは主にトレーニング力学に起因していることが明らかになった。これらの知見は、両方のアーキテクチャの帰納的バイアスを明らかにし、長期コンテキスト推論のための将来のモデルとトレーニング設計を通知する。

論文の概要: A Comparative Analysis of Contextual Representation Flow in State-Space and Transformer Architectures

関連論文リスト