Fugu-MT 論文翻訳(概要): Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

論文の概要: Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

arxiv url: http://arxiv.org/abs/2005.12900v6
Date: Fri, 16 Sep 2022 00:48:47 GMT
ステータス: 翻訳完了
システム内更新日: 2022-11-29 00:07:49.607480
Title: Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model
Title（参考訳）: 生成モデルを用いたモデルベース強化学習におけるサンプルサイズ障壁の破断
Authors: Gen Li, Yuting Wei, Yuejie Chi, Yuxin Chen
Abstract要約: 本稿では、生成モデル(シミュレータ)へのアクセスを想定して、強化学習のサンプル効率について検討する。最初に$gamma$-discounted infinite-horizon Markov decision process (MDPs) with state space $mathcalS$ and action space $mathcalA$を考える。対象の精度を考慮すれば,モデルに基づく計画アルゴリズムが最小限のサンプルの複雑さを実現するのに十分であることを示す。
参考スコア（独自算出の注目度）: 50.38446482252857
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator). We first consider $\gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $\mathcal{S}$ and action space $\mathcal{A}$. Despite a number of prior works tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy is yet to be determined. In particular, all prior results suffer from a severe sample size barrier, in the sense that their claimed statistical guarantees hold only when the sample size exceeds at least $\frac{|\mathcal{S}||\mathcal{A}|}{(1-\gamma)^2}$. The current paper overcomes this barrier by certifying the minimax optimality of two algorithms -- a perturbed model-based algorithm and a conservative model-based algorithm -- as soon as the sample size exceeds the order of $\frac{|\mathcal{S}||\mathcal{A}|}{1-\gamma}$ (modulo some log factor). Moving beyond infinite-horizon MDPs, we further study time-inhomogeneous finite-horizon MDPs, and prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level. To the best of our knowledge, this work delivers the first minimax-optimal guarantees that accommodate the entire range of sample sizes (beyond which finding a meaningful policy is information theoretically infeasible).
Abstract（参考訳）: 本稿では,生成モデル(あるいはシミュレータ)へのアクセスを想定した強化学習のサンプル効率について述べる。まず、状態空間 $\mathcal{S}$ および作用空間 $\mathcal{A}$ で、$\gamma$-discounted infinite-horizon Markov decision process (MDPs) を考える。この問題に取り組む多くの先行研究にもかかわらず、サンプルの複雑さと統計的正確性の間のトレードオフの完全な図はまだ決定されていない。特に、全ての先行結果は、それらの主張する統計的保証が少なくとも$\frac{|\mathcal{S}||\mathcal{A}|}{(1-\gamma)^2}$を超える場合にのみ保持されるという意味で、厳しいサンプルサイズ障壁に悩まされる。現在の論文では、サンプルサイズが$\frac{|\mathcal{s}|||\mathcal{a}|}{1-\gamma}$ (modulo some log factor) のオーダーを超えると、2つのアルゴリズム -- 摂動モデルベースアルゴリズムと保守モデルベースアルゴリズム -- の最小最適性を確認することで、この障壁を克服している。無限水平 MDP を超えて、時間的不均一な有限水平 MDP を更に研究し、モデルに基づく計画アルゴリズムが目的の精度レベルから最小値-最適サンプル複雑性を達成するのに十分であることを示す。私たちの知る限りでは、この研究はサンプルサイズの範囲全体に対応する最初のミニマックス最適保証を提供する(意味のあるポリシーを見つけることは理論的には不可能である)。

論文の概要: Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

関連論文リスト