Fugu-MT 論文翻訳(概要): Simplifying and Understanding State Space Models with Diagonal Linear RNNs

論文の概要: Simplifying and Understanding State Space Models with Diagonal Linear RNNs

arxiv url: http://arxiv.org/abs/2212.00768v1
Date: Thu, 1 Dec 2022 18:53:06 GMT
ステータス: 翻訳完了
システム内更新日: 2022-12-02 14:53:09.728454
Title: Simplifying and Understanding State Space Models with Diagonal Linear RNNs
Title（参考訳）: 対角線RNNによる状態空間モデルの単純化と理解
Authors: Ankit Gupta, Harsh Mehta, Jonathan Berant
Abstract要約: 本研究は、離散化ステップを解消し、バニラ対角線形RNNに基づくモデルを提案する。以上の結果から,$mathrmDLR$は従来提案されていたSSMと同等に高い監督力を有することが明らかとなった。また、合成シーケンス・ツー・シーケンス・タスクのスイートによって、SSMとアテンションベースモデルの表現性も特徴付ける。
参考スコア（独自算出の注目度）: 43.796336419481676
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sequence models based on linear state spaces (SSMs) have recently emerged as a promising choice of architecture for modeling long range dependencies across various modalities. However, they invariably rely on discretization of a continuous state space, which complicates their presentation and understanding. In this work, we dispose of the discretization step, and propose a model based on vanilla Diagonal Linear RNNs ($\mathrm{DLR}$). We empirically show that $\mathrm{DLR}$ is as performant as previously-proposed SSMs in the presence of strong supervision, despite being conceptually much simpler. Moreover, we characterize the expressivity of SSMs (including $\mathrm{DLR}$) and attention-based models via a suite of $13$ synthetic sequence-to-sequence tasks involving interactions over tens of thousands of tokens, ranging from simple operations, such as shifting an input sequence, to detecting co-dependent visual features over long spatial ranges in flattened images. We find that while SSMs report near-perfect performance on tasks that can be modeled via $\textit{few}$ convolutional kernels, they struggle on tasks requiring $\textit{many}$ such kernels and especially when the desired sequence manipulation is $\textit{context-dependent}$. For example, $\mathrm{DLR}$ learns to perfectly shift a $0.5M$-long input by an arbitrary number of positions but fails when the shift size depends on context. Despite these limitations, $\mathrm{DLR}$ reaches high performance on two higher-order reasoning tasks $\mathrm{ListOpsSubTrees}$ and $\mathrm{PathfinderSegmentation}\text{-}\mathrm{256}$ with input lengths $8K$ and $65K$ respectively, and gives encouraging performance on $\mathrm{PathfinderSegmentation}\text{-}\mathrm{512}$ with input length $262K$ for which attention is not a viable choice.
Abstract（参考訳）: 線形状態空間(ssms)に基づくシーケンスモデルは、様々なモード間の長距離依存性をモデル化するためのアーキテクチャの有望な選択として最近登場した。しかし、それらは常に、プレゼンテーションと理解を複雑にする連続状態空間の離散化に依存している。本研究では、離散化ステップを分解し、バニラ対角線形RNN(\mathrm{DLR}$)に基づくモデルを提案する。我々は,概念的にはるかに単純であるにもかかわらず,これまで提案されていたSSMと同様の性能を示すことを実証的に示す。さらに,SSMの表現性($\mathrm{DLR}$を含む)やアテンションベースモデルの特徴として,入力シーケンスのシフトなどの単純な操作から,フラット化された画像の長い空間範囲における共依存的な視覚特徴の検出に至るまで,数万のトークン間のインタラクションを含む1,13ドルの合成シーケンス・ツー・シーケンスタスクのスイートを特徴付ける。 SSMは、$\textit{few}$ convolutional kernelsを介してモデル化できるタスクについてほぼ完璧なパフォーマンスを報告しているが、$\textit{many}$そのようなカーネルを必要とするタスク、特に所望のシーケンス操作が$\textit{context-dependent}$である場合には、苦労している。例えば、$\mathrm{DLR}$は0.5M$-long入力を任意の位置で完全にシフトすることを学習するが、シフトサイズがコンテキストに依存すると失敗する。これらの制限にもかかわらず、$\mathrm{dlr}$は2つの高次推論タスクで高いパフォーマンスに達する$\mathrm{listopssubtrees}$と$\mathrm{pathfindersegmentation}\text{-}\mathrm{256}$ それぞれ8k$と65k$ であり、$\mathrm{pathfindersegmentation}\text{-}\mathrm{512}$ 入力長は262k$ であり、注意が有効な選択肢ではない。

関連論文リスト

Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling [53.925413758281096]
LrcSSMは$textitnonlinear$recurrentモデルで、現在の線形状態空間層と同じくらい高速に長いシーケンスを処理する。 LrcSSMは、Liquid-S4やMambaのような他の入力変化系が提供しないことを保証する形式的な勾配安定性を提供する。本稿では,LrcSSMがLRU,S5,Mambaより優れていることを示す。
論文参考訳（メタデータ） (2025-05-27T20:02:59Z)
Optimizing High-Dimensional Oblique Splits [0.0]
本稿では,高次元 $s$-sparse 斜め分割を $(vecw, vecwtopboldsymbolX_i) : iin 1,dots, n, vecw in mathbbRp, | vecw |_0 leq s $ から最適化する。
論文参考訳（メタデータ） (2025-03-18T16:14:38Z)
Robust Learning of Multi-index Models via Iterative Subspace Approximation [36.138661719725626]
ガウス分布下でラベルノイズを伴うマルチインデックスモデル(MIM)の学習課題について検討する。一定の正則性特性を満たす有限範囲の良好なMIMに着目する。ランダムな分類ノイズが存在する場合、我々のアルゴリズムの複雑さは1/epsilon$と不可知的にスケールする。
論文参考訳（メタデータ） (2025-02-13T17:37:42Z)
Beyond Task Diversity: Provable Representation Transfer for Sequential Multi-Task Linear Bandits [17.970177214029473]
本研究では,線形バンディットにおける生涯学習について研究し,そこでは学習者が一連の線形バンディットタスクと対話する。現在の文献では、これらのタスクは多様であり、例えば、それらのパラメータは$m$次元の部分空間に一様に存在すると仮定している。タスクの多様性を仮定せずに連続したマルチタスク線形帯域に対する最初の非自明な結果を示す。
論文参考訳（メタデータ） (2025-01-23T05:21:27Z)
Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations [40.77319247558742]
目的関数 $f_*:mathbbRdtomathbbR$ を加法構造で学習する際の計算複雑性について検討する。 2層ニューラルネットワークの勾配学習により,$f_*$の大規模なサブセットを効率的に学習できることを実証した。
論文参考訳（メタデータ） (2024-06-17T17:59:17Z)
Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
本稿では,円滑なベルマン作用素を持つ連続空間マルコフ決定過程(MDP)の一般クラスにおいて,$varepsilon$-optimal Policyを学習する問題を考察する。我々のソリューションの鍵となるのは、調和解析のアイデアに基づく新しい射影技術である。我々の結果は、連続空間 MDP における2つの人気と矛盾する視点のギャップを埋めるものである。
論文参考訳（メタデータ） (2024-05-10T09:58:47Z)
Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning [54.806166861456035]
本研究では,有限水平マルコフ決定過程(MDP)によってモデル化されたエピソディック強化学習(RL)問題をバッチ数に制約を加えて検討する。我々は,$tildeO(sqrtSAH3Kln (1/delta))$tildeO(cdot)をほぼ最適に後悔するアルゴリズムを設計し,$(S,A,H,K)$の対数項を$K$で隠蔽する。技術的貢献は2つある: 1) 探索のためのほぼ最適設計スキーム
論文参考訳（メタデータ） (2022-10-15T09:22:22Z)
Reward-Mixing MDPs with a Few Latent Contexts are Learnable [75.17357040707347]
報酬混合マルコフ決定過程(RMMDP)におけるエピソード強化学習の検討我々のゴールは、そのようなモデルにおける時間段階の累積報酬をほぼ最大化する、ほぼ最適に近いポリシーを学ぶことである。
論文参考訳（メタデータ） (2022-10-05T22:52:00Z)
Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
我々は、$AinmathbbRdtimes n$へのアクセスを考えると、潜入$k$-vertex simplex $KsubsetmathbbRdtimes n$を学習する問題を考える。実行時間における$k$への依存は、トップ$k$特異値の質量が$a$であるという自然な仮定から不要であることを示す。
論文参考訳（メタデータ） (2021-05-17T16:40:48Z)
Convergence of Sparse Variational Inference in Gaussian Processes Regression [29.636483122130027]
計算コストが$mathcalO(log N)2D(log N)2)$の手法を推論に利用できることを示す。
論文参考訳（メタデータ） (2020-08-01T19:23:34Z)
$Q$-learning with Logarithmic Regret [60.24952657636464]
楽観的な$Q$は$mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of state, $A$ is the number of action, $H$ is the planning horizon, $T$ is the total number of steps, $Delta_min$ is the least sub-Optitimality gap。
論文参考訳（メタデータ） (2020-06-16T13:01:33Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。