Fugu-MT 論文翻訳(概要): On Reward-Free Reinforcement Learning with Linear Function Approximation

論文の概要: On Reward-Free Reinforcement Learning with Linear Function Approximation

arxiv url: http://arxiv.org/abs/2006.11274v1
Date: Fri, 19 Jun 2020 17:59:36 GMT
ステータス: 翻訳完了
システム内更新日: 2022-11-19 03:20:58.074499
Title: On Reward-Free Reinforcement Learning with Linear Function Approximation
Title（参考訳）: 線形関数近似を用いた報酬フリー強化学習について
Authors: Ruosong Wang, Simon S. Du, Lin F. Yang, Ruslan Salakhutdinov
Abstract要約: Reward-free reinforcement learning (RL) は、バッチRL設定と多くの報酬関数がある設定の両方に適したフレームワークである。本研究では,線形関数近似を用いた報酬のないRLに対して,正と負の両方の結果を与える。
参考スコア（独自算出の注目度）: 144.4210285338698
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch RL setting and the setting where there are many reward functions of interest. During the exploration phase, an agent collects samples without using a pre-specified reward function. After the exploration phase, a reward function is given, and the agent uses samples collected during the exploration phase to compute a near-optimal policy. Jin et al. [2020] showed that in the tabular setting, the agent only needs to collect polynomial number of samples (in terms of the number states, the number of actions, and the planning horizon) for reward-free RL. However, in practice, the number of states and actions can be large, and thus function approximation schemes are required for generalization. In this work, we give both positive and negative results for reward-free RL with linear function approximation. We give an algorithm for reward-free RL in the linear Markov decision process setting where both the transition and the reward admit linear representations. The sample complexity of our algorithm is polynomial in the feature dimension and the planning horizon, and is completely independent of the number of states and actions. We further give an exponential lower bound for reward-free RL in the setting where only the optimal $Q$-function admits a linear representation. Our results imply several interesting exponential separations on the sample complexity of reward-free RL.
Abstract（参考訳）: Reward-free reinforcement learning (RL) は、バッチRL設定と多くの報酬関数がある設定の両方に適したフレームワークである。探索段階では、エージェントは所定の報酬関数を使わずにサンプルを収集する。探索フェーズ後、報酬関数が与えられ、探索フェーズ中に収集されたサンプルを使用して、ほぼ最適ポリシーを算出する。ジンなど。 2020] では, 報酬のないrlに対して, エージェントは多項式数のサンプル(数状態, アクション数, 計画地平線)を収集するだけでよいことを示した。しかし、実際には状態と動作の数が大きくなり、一般化には関数近似スキームが必要である。本研究では,線形関数近似を用いた報酬のないRLに対して,正と負の両方の結果を与える。我々は、遷移と報酬の両方が線形表現を許容する線形マルコフ決定過程において、報酬のないRLのアルゴリズムを与える。我々のアルゴリズムのサンプル複雑性は特徴次元と計画地平線における多項式であり、状態と行動の数とは全く独立である。さらに、最適$Q$-函数のみが線型表現を許容する設定において、報酬のない RL に対して指数的な下界を与える。この結果は、報酬のないrlのサンプル複雑性に関するいくつかの興味深い指数関数的分離を示している。

論文の概要: On Reward-Free Reinforcement Learning with Linear Function Approximation

関連論文リスト