Fugu-MT 論文翻訳(概要): Reinforcement Learning with Function Approximation: From Linear to Nonlinear

論文の概要: Reinforcement Learning with Function Approximation: From Linear to Nonlinear

arxiv url: http://arxiv.org/abs/2302.09703v2
Date: Fri, 19 May 2023 01:01:39 GMT
ステータス: 翻訳完了
システム内更新日: 2023-05-22 18:55:11.376732
Title: Reinforcement Learning with Function Approximation: From Linear to Nonlinear
Title（参考訳）: 関数近似による強化学習:線形から非線形へ
Authors: Jihao Long and Jiequn Han
Abstract要約: 本稿では,線形あるいは非線形近似設定における強化学習アルゴリズムの誤差解析に関する最近の結果についてレビューする。近似誤差に関する諸特性について考察し、遷移確率と報酬関数に関する具体的条件について述べる。
参考スコア（独自算出の注目度）: 4.314956204483073
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Function approximation has been an indispensable component in modern reinforcement learning algorithms designed to tackle problems with large state spaces in high dimensions. This paper reviews recent results on error analysis for these reinforcement learning algorithms in linear or nonlinear approximation settings, emphasizing approximation error and estimation error/sample complexity. We discuss various properties related to approximation error and present concrete conditions on transition probability and reward function under which these properties hold true. Sample complexity analysis in reinforcement learning is more complicated than in supervised learning, primarily due to the distribution mismatch phenomenon. With assumptions on the linear structure of the problem, numerous algorithms in the literature achieve polynomial sample complexity with respect to the number of features, episode length, and accuracy, although the minimax rate has not been achieved yet. These results rely on the $L^\infty$ and UCB estimation of estimation error, which can handle the distribution mismatch phenomenon. The problem and analysis become substantially more challenging in the setting of nonlinear function approximation, as both $L^\infty$ and UCB estimation are inadequate for bounding the error with a favorable rate in high dimensions. We discuss additional assumptions necessary to address the distribution mismatch and derive meaningful results for nonlinear RL problems.
Abstract（参考訳）: 関数近似は、高次元の大きな状態空間の問題に対処するために設計された現代の強化学習アルゴリズムにおいて欠かせない要素である。本稿では、線形あるいは非線形近似設定におけるこれらの強化学習アルゴリズムの誤差解析に関する最近の結果について、近似誤差と推定誤差/サンプル複雑性を強調する。近似誤差に関する諸性質について考察し,これらの性質が真である遷移確率と報酬関数に関する具体的な条件について述べる。強化学習におけるサンプル複雑性解析は、主に分布ミスマッチ現象のため、教師あり学習よりも複雑である。問題の線形構造を仮定すると、多くのアルゴリズムが特徴数、エピソード長、正確性に関して多項式のサンプル複雑性を達成するが、最小化速度はまだ達成されていない。これらの結果は、分布ミスマッチ現象を処理できる推定誤差の$l^\infty$およびucb推定に依存する。 L^\infty$ と UCB の推定の両方が高次元での誤差の有界化に不適切であるため、非線形関数近似の設定において問題と解析がかなり困難になる。分散ミスマッチに対処するために必要な追加の仮定について検討し,非線形rl問題に対して有意義な結果を導出する。

関連論文リスト

Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
機械学習の幅広い問題にまたがる正規化誤差フィードバックアルゴリズムに対する収束の最初の証明を提供する。提案手法では,許容可能なステップサイズが大きくなったため,新しい正規化エラーフィードバックアルゴリズムは,各種タスクにおける非正規化エラーよりも優れていた。
論文参考訳（メタデータ） (2024-10-22T10:19:27Z)
Inexact subgradient methods for semialgebraic functions [18.293072574300798]
機械学習と機械学習の最適化において近似微分が広く使われていることから、我々は、非消滅エラーを伴う過渡的手法を不正確なものにしている。
論文参考訳（メタデータ） (2024-04-30T12:47:42Z)
Neural Network Approximation for Pessimistic Offline Reinforcement Learning [17.756108291816908]
一般ニューラルネットワーク近似を用いた悲観的オフラインRLの非漸近的推定誤差を提案する。その結果, 推定誤差は2つの部分から構成されることがわかった。第1は, 部分的に制御可能な集束率でサンプルサイズに所望の速度で0に収束し, 第2は残留制約が厳密であれば無視可能である。
論文参考訳（メタデータ） (2023-12-19T05:17:27Z)
Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning [53.97335841137496]
非線形関数近似を用いたオフラインRLにおけるPNLSVI(Pessimistic Least-Square Value Iteration)と呼ばれるオラクル効率のアルゴリズムを提案する。本アルゴリズムは,関数クラスの複雑性に強く依存する後悔境界を享受し,線形関数近似に特化して最小限のインスタンス依存後悔を実現する。
論文参考訳（メタデータ） (2023-10-02T17:42:01Z)
Online Regularized Learning Algorithm for Functional Data [2.5382095320488673]
本稿では,Hilbertカーネル空間におけるオンライン正規化学習アルゴリズムについて考察する。その結果, 定常的なステップサイズでの予測誤差と推定誤差の収束速度は, 文献と競合することがわかった。
論文参考訳（メタデータ） (2022-11-24T11:56:10Z)
Asymptotically Unbiased Instance-wise Regularized Partial AUC Optimization: Theory and Algorithm [101.44676036551537]
One-way partial AUC (OPAUC) と Two-way partial AUC (TPAUC) はバイナリ分類器の平均性能を測定する。既存の手法のほとんどはPAUCをほぼ最適化するしかなく、制御不能なバイアスにつながる。本稿では,分散ロバスト最適化AUCによるPAUC問題の簡易化について述べる。
論文参考訳（メタデータ） (2022-10-08T08:26:22Z)
Non-Linear Reinforcement Learning in Large Action Spaces: Structural Conditions and Sample-efficiency of Posterior Sampling [28.371541697552928]
一般作用空間を線形埋め込み性条件下で保持する非線形関数近似の最初の結果を示す。最悪の場合,RL問題のランクパラメータでスケールが保証される。
論文参考訳（メタデータ） (2022-03-15T20:50:26Z)
Instance-optimality in optimal value estimation: Adaptivity via variance-reduced Q-learning [99.34907092347733]
本稿では,マルコフ決定過程における最適な$Q$値関数を離散状態と動作で推定する問題を解析する。局所的なミニマックスフレームワークを用いて、この関数は任意の推定手順の精度の低い境界に現れることを示す。他方,Q$ラーニングの分散還元版を解析することにより,状態と行動空間の対数的要因まで,下位境界のシャープさを確立する。
論文参考訳（メタデータ） (2021-06-28T00:38:54Z)
Optimal oracle inequalities for solving projected fixed-point equations [53.31620399640334]
ヒルベルト空間の既知の低次元部分空間を探索することにより、確率観測の集合を用いて近似解を計算する手法を検討する。本稿では,線形関数近似を用いた政策評価問題に対する時間差分学習手法の誤差を正確に評価する方法について述べる。
論文参考訳（メタデータ） (2020-12-09T20:19:32Z)
Learning Fast Approximations of Sparse Nonlinear Regression [50.00693981886832]
本研究では,Threshold Learned Iterative Shrinkage Algorithming (NLISTA)を導入することでギャップを埋める。合成データを用いた実験は理論結果と相関し,その手法が最先端の手法より優れていることを示す。
論文参考訳（メタデータ） (2020-10-26T11:31:08Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。