Fugu-MT 論文翻訳(概要): Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

論文の概要: Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

arxiv url: http://arxiv.org/abs/2006.06135v1
Date: Thu, 11 Jun 2020 00:55:35 GMT
ステータス: 翻訳完了
システム内更新日: 2022-11-22 09:45:54.906336
Title: Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation
Title（参考訳）: 低ランク行列推定によるサンプル効率強化学習
Authors: Devavrat Shah, Dogyoon Song, Zhi Xu, Yuzhe Yang
Abstract要約: 連続状態と行動空間を用いた強化学習において,Q$関数を効率よく学習する方法を考える。我々は、$epsilon$-Schmidt $Q$-functionと$widetildeO(frac1epsilonmax(d1, d_2)+2)$のサンプル複雑性を求める単純な反復学習アルゴリズムを開発する。
参考スコア（独自算出の注目度）: 30.137884459159107
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the question of learning $Q$-function in a sample efficient manner for reinforcement learning with continuous state and action spaces under a generative model. If $Q$-function is Lipschitz continuous, then the minimal sample complexity for estimating $\epsilon$-optimal $Q$-function is known to scale as ${\Omega}(\frac{1}{\epsilon^{d_1+d_2 +2}})$ per classical non-parametric learning theory, where $d_1$ and $d_2$ denote the dimensions of the state and action spaces respectively. The $Q$-function, when viewed as a kernel, induces a Hilbert-Schmidt operator and hence possesses square-summable spectrum. This motivates us to consider a parametric class of $Q$-functions parameterized by its "rank" $r$, which contains all Lipschitz $Q$-functions as $r \to \infty$. As our key contribution, we develop a simple, iterative learning algorithm that finds $\epsilon$-optimal $Q$-function with sample complexity of $\widetilde{O}(\frac{1}{\epsilon^{\max(d_1, d_2)+2}})$ when the optimal $Q$-function has low rank $r$ and the discounting factor $\gamma$ is below a certain threshold. Thus, this provides an exponential improvement in sample complexity. To enable our result, we develop a novel Matrix Estimation algorithm that faithfully estimates an unknown low-rank matrix in the $\ell_\infty$ sense even in the presence of arbitrary bounded noise, which might be of interest in its own right. Empirical results on several stochastic control tasks confirm the efficacy of our "low-rank" algorithms.
Abstract（参考訳）: 生成モデルの下で連続状態と動作空間を用いた強化学習をサンプルとして,$q$-function を効率的に学習する問題を考える。もし$q$-関数がリプシッツ連続であるなら、$\epsilon$-optimal $q$-関数を推定する最小サンプル複雑性は、古典的非パラメトリック学習理論に対して${\omega}(\frac{1}{\epsilon^{d_1+d_2+2}})$と計算される。 Q$-函数は核と見なすとヒルベルト・シュミット作用素を誘導し、従って二乗可換スペクトルを持つ。これは、すべてのリプシッツ$Q$-函数を$r \to \infty$として含む「ランク」$r$でパラメトリックな$Q$-函数のパラメトリッククラスを考える動機である。我々の重要な貢献として、最適な$q$-関数がランク$r$で、ディスカウント係数$\gamma$が一定のしきい値以下である場合に、$\widetilde{o}(\frac{1}{\epsilon^{\max(d_1, d_2)+2}})のサンプル複雑性を持つ$\epsilon$-optimal $q$-functionを求める単純な反復学習アルゴリズムを開発した。これにより、サンプルの複雑さが指数関数的に向上する。この結果を実現するために,任意の有界雑音が存在する場合でも,未知の低ランク行列を$\ell_\infty$ senseで忠実に推定する新しい行列推定アルゴリズムを開発した。いくつかの確率的制御タスクにおける実験結果から,我々の「低ランク」アルゴリズムの有効性が確認された。

関連論文リスト

Robust Learning of Multi-index Models via Iterative Subspace Approximation [36.138661719725626]
ガウス分布下でラベルノイズを伴うマルチインデックスモデル(MIM)の学習課題について検討する。一定の正則性特性を満たす有限範囲の良好なMIMに着目する。ランダムな分類ノイズが存在する場合、我々のアルゴリズムの複雑さは1/epsilon$と不可知的にスケールする。
論文参考訳（メタデータ） (2025-02-13T17:37:42Z)
Learning Networks from Wide-Sense Stationary Stochastic Processes [7.59499154221528]
ここでの重要な推論問題は、ノード出力(ポテンシャル)からエッジ接続を学習することである。我々はWhittleの最大可能性推定器(MLE)を用いて時間相関サンプルから$Last$のサポートを学習する。 MLE問題は厳密な凸であり、ユニークな解であることを示す。
論文参考訳（メタデータ） (2024-12-04T23:14:00Z)
Sample and Computationally Efficient Robust Learning of Gaussian Single-Index Models [37.42736399673992]
シングルインデックスモデル (SIM) は $sigma(mathbfwast cdot mathbfx)$ という形式の関数であり、$sigma: mathbbR to mathbbR$ は既知のリンク関数であり、$mathbfwast$ は隠れ単位ベクトルである。適切な学習者が$L2$-error of $O(mathrmOPT)+epsilon$。
論文参考訳（メタデータ） (2024-11-08T17:10:38Z)
Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations [40.77319247558742]
目的関数 $f_*:mathbbRdtomathbbR$ を加法構造で学習する際の計算複雑性について検討する。 2層ニューラルネットワークの勾配学習により,$f_*$の大規模なサブセットを効率的に学習できることを実証した。
論文参考訳（メタデータ） (2024-06-17T17:59:17Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
正方形損失に関して、標準的なガウス分布の下での$k$ReLU活性化の線形結合をPAC学習する問題をmathbbRd$で検討する。本研究の主な成果は,この学習課題に対して,サンプルおよび計算複雑性が$(dk/epsilon)O(k)$で,epsilon>0$が目標精度である。
論文参考訳（メタデータ） (2023-07-24T14:37:22Z)
Near-Optimal Bounds for Learning Gaussian Halfspaces with Random Classification Noise [50.64137465792738]
この問題に対する効率的なSQアルゴリズムは、少なくとも$Omega(d1/2/(maxp, epsilon)2)$. のサンプル複雑性を必要とする。我々の下限は、この1/epsilon$に対する二次的依存は、効率的なアルゴリズムに固有のものであることを示唆している。
論文参考訳（メタデータ） (2023-07-13T18:59:28Z)
Overcoming the Long Horizon Barrier for Sample-Efficient Reinforcement Learning with Latent Low-Rank Structure [9.759209713196718]
我々は、対応する最適$Q*$関数が低ランクであるMDPのクラスを考える。より強い低階構造仮定の下では、生成モデル(LR-MCPI)と低階経験値イテレーション(LR-EVI)が、ランクに対して$tildeOleft((|S|+|A|)mathrmpoly(d,H)/epsilon2right)$の所望のサンプル複雑性を実現することが示されている。
論文参考訳（メタデータ） (2022-06-07T20:39:51Z)
Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
既知の滑らかな関数 $f$ の値を未知の点 $boldsymbolmu in mathbbRn$ で推定する問題について検討する。我々は、各座標の重要性に応じてサンプルを学習するインスタンス適応アルゴリズムを設計し、少なくとも1-delta$の確率で$epsilon$の正確な推定値である$f(boldsymbolmu)$を返す。
論文参考訳（メタデータ） (2022-03-18T18:50:52Z)
Improved Sample Complexity for Incremental Autonomous Exploration in MDPs [132.88757893161699]
我々は $epsilon$-optimal 目標条件付きポリシーのセットを学び、$ L$ ステップ内で段階的に到達可能なすべての状態を達成します。 DisCoは、コストに敏感な最短経路問題に対して$epsilon/c_min$-optimalポリシーを返すことができる最初のアルゴリズムです。
論文参考訳（メタデータ） (2020-12-29T14:06:09Z)
Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity [59.34067736545355]
S$状態、$A$アクション、割引係数$gamma in (0,1)$、近似しきい値$epsilon > 0$の MDP が与えられた場合、$epsilon$-Optimal Policy を学ぶためのモデルなしアルゴリズムを提供する。十分小さな$epsilon$の場合、サンプルの複雑さで改良されたアルゴリズムを示す。
論文参考訳（メタデータ） (2020-06-06T13:34:41Z)
A Randomized Algorithm to Reduce the Support of Discrete Measures [79.55586575988292]
離散確率測度が$N$原子と$n$実数値関数の集合で成り立つと、元の$N$原子の$n+1$の部分集合で支えられる確率測度が存在する。我々は、負の円錐によるバリセンターの簡単な幾何学的特徴付けを与え、この新しい測度を「グリード幾何学的サンプリング」によって計算するランダム化アルゴリズムを導出する。次に、その性質を研究し、それを合成および実世界のデータにベンチマークして、$Ngg n$ regimeにおいて非常に有益であることを示す。
論文参考訳（メタデータ） (2020-06-02T16:38:36Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。