Fugu-MT 論文翻訳(概要): On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces

論文の概要: On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces

arxiv url: http://arxiv.org/abs/2011.04622v2
Date: Tue, 29 Dec 2020 17:24:48 GMT
ステータス: 翻訳完了
システム内更新日: 2022-09-28 00:07:49.278946
Title: On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces
Title（参考訳）: 強化学習における関数近似について--大きな状態空間に面した楽観主義
Authors: Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael I. Jordan
Abstract要約: 強化学習のコアにおける探索・探索トレードオフについて検討する。特に、関数クラス $mathcalF$ の複雑さが関数の複雑さを特徴づけていることを証明する。私たちの後悔の限界はエピソードの数とは無関係です。
参考スコア（独自算出の注目度）: 208.67848059021915
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The classical theory of reinforcement learning (RL) has focused on tabular and linear representations of value functions. Further progress hinges on combining RL with modern function approximators such as kernel functions and deep neural networks, and indeed there have been many empirical successes that have exploited such combinations in large-scale applications. There are profound challenges, however, in developing a theory to support this enterprise, most notably the need to take into consideration the exploration-exploitation tradeoff at the core of RL in conjunction with the computational and statistical tradeoffs that arise in modern function-approximation-based learning systems. We approach these challenges by studying an optimistic modification of the least-squares value iteration algorithm, in the context of the action-value function represented by a kernel function or an overparameterized neural network. We establish both polynomial runtime complexity and polynomial sample complexity for this algorithm, without additional assumptions on the data-generating model. In particular, we prove that the algorithm incurs an $\tilde{\mathcal{O}}(\delta_{\mathcal{F}} H^2 \sqrt{T})$ regret, where $\delta_{\mathcal{F}}$ characterizes the intrinsic complexity of the function class $\mathcal{F}$, $H$ is the length of each episode, and $T$ is the total number of episodes. Our regret bounds are independent of the number of states, a result which exhibits clearly the benefit of function approximation in RL.
Abstract（参考訳）: 古典的強化学習理論(RL)は、値関数の表象および線形表現に焦点を当てている。さらに、RLとカーネル関数やディープニューラルネットワークのような現代的な関数近似器を組み合わせることに注力しており、実際、そのような組み合わせを大規模アプリケーションで活用した経験的な成功が数多くある。しかし、この企業を支援するための理論を開発する際には、特に、現代の関数近似に基づく学習システムで発生する計算的および統計的トレードオフとともに、RLの中核における探索・探索的トレードオフを考慮する必要がある。我々は,カーネル関数や過パラメータニューラルネットワークで表される動作値関数の文脈において,最小二乗値反復アルゴリズムを楽観的に修正することで,これらの課題にアプローチする。このアルゴリズムの多項式ランタイム複雑性と多項式サンプル複雑性の両方を,データ生成モデルに対する追加の仮定なしに確立する。特に、アルゴリズムが$\tilde{\mathcal{O}}(\delta_{\mathcal{F}} H^2 \sqrt{T})$ regret, ここで、$\delta_{\mathcal{F}}$は関数クラス$\mathcal{F}$、$H$は各エピソードの長さであり、$T$はエピソードの総数であることを示す。我々の後悔の限界は状態の数とは独立であり、RLにおける関数近似の利点が明らかに示される。

関連論文リスト

Uncertainty-Aware Reward-Free Exploration with General Function Approximation [69.27868448449755]
本稿では、algと呼ばれる報酬のない強化学習アルゴリズムを提案する。私たちのアルゴリズムの背後にある重要なアイデアは、環境を探索する上で不確実性を認識した本質的な報酬である。実験の結果、GFA-RFEは最先端の教師なしRLアルゴリズムよりも優れ、あるいは同等であることがわかった。
論文参考訳（メタデータ） (2024-06-24T01:37:18Z)
Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations [40.77319247558742]
目的関数 $f_*:mathbbRdtomathbbR$ を加法構造で学習する際の計算複雑性について検討する。 2層ニューラルネットワークの勾配学習により,$f_*$の大規模なサブセットを効率的に学習できることを実証した。
論文参考訳（メタデータ） (2024-06-17T17:59:17Z)
A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation [66.26739783789387]
我々は、強化学習のための新しいアルゴリズム、MQL-UCBを用いたモノトニックQ-Learningを提案する。 MQL-UCBは、$tildeO(dsqrtHK)$の最小限の後悔を実現する。本研究は,非線形関数近似を用いたサンプル効率およびデプロイメント効率のよいQ-ラーニングの設計に重点を置いている。
論文参考訳（メタデータ） (2023-11-26T08:31:57Z)
On Representation Complexity of Model-based and Model-free Reinforcement Learning [11.843778337443824]
回路複雑性の文脈におけるモデルベースおよびモデルフリー強化学習(RL)の表現複雑性について検討した。理論的には、その基底となる遷移関数と報酬関数が、大きさの一定深さの回路で表現できるような、幅広い種類のMDPが存在することを証明している。近似誤差に注意を向け、複雑性理論への接続を構築することによって、モデルベースのアルゴリズムが、新しい表現複雑性の観点からモデルフリーアルゴリズムよりも、なぜサンプルの複雑さを楽しむのかというユニークな洞察を提供する。
論文参考訳（メタデータ） (2023-10-03T00:01:58Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
正方形損失に関して、標準的なガウス分布の下での$k$ReLU活性化の線形結合をPAC学習する問題をmathbbRd$で検討する。本研究の主な成果は,この学習課題に対して,サンプルおよび計算複雑性が$(dk/epsilon)O(k)$で,epsilon>0$が目標精度である。
論文参考訳（メタデータ） (2023-07-24T14:37:22Z)
Optimal approximation using complex-valued neural networks [0.0]
複雑評価ニューラルネットワーク(CVNN)は最近、有望な経験的成功を示している。 CVNNの表現性を近似特性を用いて解析する。
論文参考訳（メタデータ） (2023-03-29T15:56:43Z)
Provably Efficient Reinforcement Learning via Surprise Bound [66.15308700413814]
本稿では,一般値関数近似を用いた効率の良い強化学習アルゴリズムを提案する。本アルゴリズムは, 線形設定と疎高次元線形設定の両方に適用した場合に, 合理的な後悔境界を達成できる。
論文参考訳（メタデータ） (2023-02-22T20:21:25Z)
Sample Complexity of Kernel-Based Q-Learning [11.32718794195643]
任意に大規模に割引されたMDPにおいて,$epsilon$-optimal Policyを求める非パラメトリックQ-ラーニングアルゴリズムを提案する。我々の知る限りでは、このような一般モデルの下では、有限サンプルの複雑さを示す最初の結果である。
論文参考訳（メタデータ） (2023-02-01T19:46:25Z)
Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension [124.7752517531109]
一般値関数近似を用いた効率の良い強化学習アルゴリズムを確立する。我々のアルゴリズムは、$d$が複雑性測度である場合、$widetildeO(mathrmpoly(dH)sqrtT)$の後悔の限界を達成することを示す。我々の理論は線形値関数近似によるRLの最近の進歩を一般化し、環境モデルに対する明示的な仮定をしない。
論文参考訳（メタデータ） (2020-05-21T17:36:09Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。