Fugu-MT 論文翻訳(概要): Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis

論文の概要: Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis

arxiv url: http://arxiv.org/abs/2510.00399v1
Date: Wed, 01 Oct 2025 01:25:01 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-03 16:59:20.31931
Title: Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis
Title（参考訳）: マムバはアウトリーチの文脈で学べるか? : 理論的一般化分析
Authors: Hongkang Li, Songtao Lu, Xiaodong Cui, Pin-Yu Chen, Meng Wang,
Abstract要約: MambaモデルはTransformerベースのモデルよりも計算上の優位性に大きく注目されている。本稿では,一層マンバモデルのトレーニング力学に関する最初の理論的解析を行った。マムバは、より多くのトレーニングを必要とするかもしれないが、線形変換器が許容できるしきい値を超える場合であっても、正確な予測を保っている。
参考スコア（独自算出の注目度）: 88.05636819649804
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Mamba model has gained significant attention for its computational advantages over Transformer-based models, while achieving comparable performance across a wide range of language tasks. Like Transformers, Mamba exhibits in-context learning (ICL) capabilities, i.e., making predictions for new tasks based on a prompt containing input-label pairs and a query, without requiring fine-tuning. Despite its empirical success, the theoretical understanding of Mamba remains limited, largely due to the nonlinearity introduced by its gating mechanism. To the best of our knowledge, this paper presents the first theoretical analysis of the training dynamics of a one-layer Mamba model, which consists of a linear attention component followed by a nonlinear gating layer, and its ICL generalization on unseen binary classification tasks, even when the prompt includes additive outliers. Our analysis shows that Mamba leverages the linear attention layer to select informative context examples and uses the nonlinear gating layer to suppress the influence of outliers. By establishing and comparing to the analysis of linear Transformers under the same setting, we show that although Mamba may require more training iterations to converge, it maintains accurate predictions even when the proportion of outliers exceeds the threshold that a linear Transformer can tolerate. These theoretical findings are supported by empirical experiments.
Abstract（参考訳）: Mambaモデルは、Transformerベースのモデルよりも計算上の優位性に大きく注目されている一方で、幅広い言語タスクで同等のパフォーマンスを実現している。 Transformersと同様に、Mambaはインコンテキスト学習(ICL)機能、すなわち入力ラベルペアとクエリを含むプロンプトに基づいて新しいタスクの予測を行う。経験的な成功にもかかわらず、マンバの理論的な理解は依然として限られている。そこで本研究では,線形注意成分と非線形ゲーティング層からなる一層マンバモデルのトレーニング力学に関する最初の理論的解析を行った。分析の結果,Mambaは線形アテンション層を利用して情報的文脈の例を選択し,非線形ゲーティング層を用いて外れ値の影響を抑えることがわかった。同じ条件下で線形変換器を定式化して比較することにより、Mambaはより訓練を繰り返して収束させる必要があるが、線形変換器が許容できるしきい値を超えても正確な予測を維持できることを示す。これらの理論的な発見は実証実験によって裏付けられている。

論文の概要: Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis

関連論文リスト