Fugu-MT 論文翻訳(概要): Mamba-3: Improved Sequence Modeling using State Space Principles

論文の概要: Mamba-3: Improved Sequence Modeling using State Space Principles

arxiv url: http://arxiv.org/abs/2603.15569v1
Date: Mon, 16 Mar 2026 17:30:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 18:28:58.695229
Title: Mamba-3: Improved Sequence Modeling using State Space Principles
Title（参考訳）: Mamba-3: 状態空間原理を用いたシーケンスモデリングの改善
Authors: Aakash Lahoti, Kevin Y. Li, Berlin Chen, Caitlin Wang, Aviv Bick, J. Zico Kolter, Tri Dao, Albert Gu,
Abstract要約: 線形モデルの状態空間モデル(SSM)の視点に触発された3つの中核的方法論的改善を紹介する。アーキテクチャの改良とともに、Mamba-3モデルは、検索、状態追跡、下流言語モデリングタスク間で大きな進歩を遂げます。
参考スコア（独自算出の注目度）: 74.41028882099846
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scaling inference-time compute has emerged as an important driver of LLM performance, making inference efficiency a central focus of model design alongside model quality. While the current Transformer-based models deliver strong model quality, their quadratic compute and linear memory make inference expensive. This has spurred the development of sub-quadratic models with reduced linear compute and constant memory requirements. However, many recent linear models trade off model quality and capability for algorithmic efficiency, failing on tasks such as state tracking. Moreover, their theoretically linear inference remains hardware-inefficient in practice. Guided by an inference-first perspective, we introduce three core methodological improvements inspired by the state space model (SSM) viewpoint of linear models. We combine: (1) a more expressive recurrence derived from SSM discretization, (2) a complex-valued state update rule that enables richer state tracking, and (3) a multi-input, multi-output (MIMO) formulation for better model performance without increasing decode latency. Together with architectural refinements, our Mamba-3 model achieves significant gains across retrieval, state-tracking, and downstream language modeling tasks. At the 1.5B scale, Mamba-3 improves average downstream accuracy by 0.6 percentage points compared to the next best model (Gated DeltaNet), with Mamba-3's MIMO variant further improving accuracy by another 1.2 points for a total 1.8 point gain. Across state-size experiments, Mamba-3 achieves comparable perplexity to Mamba-2 despite using half of its predecessor's state size. Our evaluations demonstrate Mamba-3's ability to advance the performance-efficiency Pareto frontier.
Abstract（参考訳）: 推論時間計算のスケーリングはLLM性能の重要な要因として現れており、推論効率はモデル品質とともにモデル設計の中心となる。現在のTransformerベースのモデルは強力なモデル品質を提供するが、その二次計算と線形メモリは推論を高価にしている。これにより、線形計算と定数メモリの要求を減らしたサブクワッドラティックモデルの開発が加速された。しかし、最近の線形モデルの多くは、モデルの品質とアルゴリズムの効率性をトレードオフし、状態追跡のようなタスクでは失敗している。さらに、理論上線形推論は実際にはハードウェア非効率である。提案手法は,線形モデルの状態空間モデル(SSM)にインスパイアされた3つの方法論的改善を提案する。 1)SSMの離散化から導かれるより表現力のある再帰、(2)よりリッチな状態追跡を可能にする複雑な値の更新ルール、(3)復号遅延を増大させることなくモデル性能を向上させるためのマルチインプット・マルチアウトプット(MIMO)の定式化。アーキテクチャの改良とともに、Mamba-3モデルは、検索、状態追跡、下流言語モデリングタスク間で大きな進歩を遂げます。 1.5Bスケールでは、Mamba-3は次のベストモデル(Gated DeltaNet)と比較して平均ダウンストリーム精度を0.6ポイント改善し、Mamba-3のMIMOは1.8ポイント当たり1.2ポイントの精度をさらに向上した。状態サイズの実験全体で、Mamba-3は前者の状態サイズの半分を使用しているにもかかわらず、Mamba-2に匹敵する難易度を達成している。本評価は,Mamba-3によるパレートフロンティアの性能向上効果を示すものである。

論文の概要: Mamba-3: Improved Sequence Modeling using State Space Principles

関連論文リスト