Fugu-MT 論文翻訳(概要): Expressivity-Efficiency Tradeoffs for Hybrid Sequence Models

論文の概要: Expressivity-Efficiency Tradeoffs for Hybrid Sequence Models

arxiv url: http://arxiv.org/abs/2603.08859v1
Date: Mon, 09 Mar 2026 19:20:01 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-11 15:25:23.79521
Title: Expressivity-Efficiency Tradeoffs for Hybrid Sequence Models
Title（参考訳）: ハイブリッドシーケンスモデルに対する表現性-効率トレードオフ
Authors: John Cooper, Ilias Diakonikolas, Mingchen Ma, Frederic Sala,
Abstract要約: 非ハイブリッドモデルに対する基本的な制限の存在を証明する。我々は,これらの課題を確実に解決する,小さなサイズと作業メモリのハイブリッドモデルを構築した。さらに,ハイブリッドモデルが非ハイブリッドモデルよりも長大化および分布外堅牢性を示すことを示す。
参考スコア（独自算出の注目度）: 50.45915413315706
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hybrid sequence models--combining Transformer and state-space model layers--seek to gain the expressive versatility of attention as well as the computational efficiency of state-space model layers. Despite burgeoning interest in hybrid models, we lack a basic understanding of the settings where--and underlying mechanisms through which--they offer benefits over their constituent models. In this paper, we study this question, focusing on a broad family of core synthetic tasks. For this family of tasks, we prove the existence of fundamental limitations for non-hybrid models. Specifically, any Transformer or state-space model that solves the underlying task requires either a large number of parameters or a large working memory. On the other hand, for two prototypical tasks within this family--namely selective copying and associative recall--we construct hybrid models of small size and working memory that provably solve these tasks, thus achieving the best of both worlds. Our experimental evaluation empirically validates our theoretical findings. Importantly, going beyond the settings in our theoretical analysis, we empirically show that learned--rather than constructed--hybrids outperform non-hybrid models with up to 6x as many parameters. We additionally demonstrate that hybrid models exhibit stronger length generalization and out-of-distribution robustness than non-hybrids.
Abstract（参考訳）: ハイブリッドシーケンスモデル - Transformer と State-space モデルを組み合わせて - 注意力の表現力と状態空間モデル層の計算効率を得る。ハイブリッドモデルへの急激な関心にもかかわらず、我々は設定の基本的な理解が欠如しています。本稿では,この課題を,コア合成タスクの幅広いファミリーに焦点をあてて検討する。このようなタスクの族に対して、非ハイブリッドモデルに対する基本的な制限の存在を証明します。具体的には、基礎となるタスクを解決するトランスフォーマーまたはステートスペースモデルには、大量のパラメータまたは大きなワーキングメモリが必要である。一方、このファミリー内の2つの原型的タスク、すなわち、選択的コピーと連想的リコールは、これらのタスクを確実に解決する、小さなサイズと作業記憶のハイブリッドモデルを構築し、両方の世界のベストを達成します。実験による評価は理論的な知見を実証的に検証する。重要なことは、理論解析における設定を超えて、我々は、構築されたものよりも学習されたものが、最大6倍のパラメータを持つ非ハイブリッドモデルより優れていることを実証的に示すことである。さらに,ハイブリッドモデルが非ハイブリッドモデルよりも長大化および分布外堅牢性を示すことを示す。

論文の概要: Expressivity-Efficiency Tradeoffs for Hybrid Sequence Models

関連論文リスト