Fugu-MT 論文翻訳(概要): More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling

論文の概要: More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling

arxiv url: http://arxiv.org/abs/2406.12241v1
Date: Tue, 18 Jun 2024 03:32:10 GMT
ステータス: 翻訳完了
システム内更新日: 2024-06-19 22:49:04.382514
Title: More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling
Title（参考訳）: 近似サンプリングによる強化学習のためのより効率的なランダム化探索
Authors: Haque Ishfaq, Yixin Tan, Yu Yang, Qingfeng Lan, Jianfeng Lu, A. Rupam Mahmood, Doina Precup, Pan Xu,
Abstract要約: 最近提案されたFeel-Good Thompson Sampling (FGTS) アプローチを用いて,様々な近似サンプリング手法を組み込んだアルゴリズムフレームワークを提案する。我々の後悔分析は、既存のランダム化アルゴリズムを超越した次元性への後悔の最もよく知られた依存性をもたらす。我々のアルゴリズムは、RLの深い文献から得られる他の強いベースラインに匹敵する、あるいは同等の性能を達成する。
参考スコア（独自算出の注目度）: 41.21199687865359
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While the emerging approximate sampling-based exploration schemes are promising, most existing algorithms are specific to linear Markov Decision Processes (MDP) with suboptimal regret bounds, or only use the most basic samplers such as Langevin Monte Carlo. In this work, we propose an algorithmic framework that incorporates different approximate sampling methods with the recently proposed Feel-Good Thompson Sampling (FGTS) approach (Zhang, 2022; Dann et al., 2021), which was previously known to be computationally intractable in general. When applied to linear MDPs, our regret analysis yields the best known dependency of regret on dimensionality, surpassing existing randomized algorithms. Additionally, we provide explicit sampling complexity for each employed sampler. Empirically, we show that in tasks where deep exploration is necessary, our proposed algorithms that combine FGTS and approximate sampling perform significantly better compared to other strong baselines. On several challenging games from the Atari 57 suite, our algorithms achieve performance that is either better than or on par with other strong baselines from the deep RL literature.
Abstract（参考訳）: トンプソンサンプリング(Thompson sample, TS)は、強化学習(RL)において最も人気のある探索手法の一つである。しかし、理論的な保証を持つほとんどのTSアルゴリズムは実装が困難であり、Deep RLには一般化できない。出現する近似サンプリングベースの探索スキームは有望であるが、既存のアルゴリズムのほとんどは、線形マルコフ決定過程(英語版)(MDP)に特化しており、最適の後悔境界を持つか、ランゲヴィン・モンテカルロ(英語版)のような最も基本的なサンプルしか使用していない。本研究では,最近提案されたFeel-Good Thompson Smpling (FGTS) アプローチ (Zhang, 2022; Dann et al , 2021) を用いて,様々な近似サンプリング手法を組み込んだアルゴリズムフレームワークを提案する。線形MDPに適用した場合、我々の後悔分析は、既存のランダム化アルゴリズムを超越して、次元性に対する後悔の最もよく知られた依存性をもたらす。さらに, 使用したサンプルに対して, 明示的なサンプリングの複雑さを提供する。実験により、深層探査が必要なタスクにおいて、FGTSと近似サンプリングを組み合わせた提案アルゴリズムは、他の強力なベースラインに比べて大幅に性能が向上することを示した。 Atari 57スイートのいくつかの挑戦的なゲームにおいて、我々のアルゴリズムは、RLの深い文献から得られる他の強力なベースラインに匹敵するパフォーマンスを達成する。

論文の概要: More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling

関連論文リスト