Fugu-MT 論文翻訳(概要): Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus

論文の概要: Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus

arxiv url: http://arxiv.org/abs/2206.00159v1
Date: Wed, 1 Jun 2022 00:18:15 GMT
ステータス: 翻訳完了
システム内更新日: 2022-06-02 15:57:14.684593
Title: Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus
Title（参考訳）: 戦略的ボーナスによるオフラインマルチエージェント強化学習の可能性
Authors: Qiwen Cui and Simon S. Du
Abstract要約: 本稿では,共同戦略の信頼区間を構築する戦略的な集中原理を提案する。 2人のプレイヤーによるゼロサムマルコフゲームの場合、戦略的なボーナスの凸性を利用して効率的なアルゴリズムを提案する。すべてのアルゴリズムは、指定済みの戦略クラスである$Pi$を入力として取り、最良の戦略に近い戦略を$Pi$で出力することができる。
参考スコア（独自算出の注目度）: 48.34563955829649
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper considers offline multi-agent reinforcement learning. We propose the strategy-wise concentration principle which directly builds a confidence interval for the joint strategy, in contrast to the point-wise concentration principle that builds a confidence interval for each point in the joint action space. For two-player zero-sum Markov games, by exploiting the convexity of the strategy-wise bonus, we propose a computationally efficient algorithm whose sample complexity enjoys a better dependency on the number of actions than the prior methods based on the point-wise bonus. Furthermore, for offline multi-agent general-sum Markov games, based on the strategy-wise bonus and a novel surrogate function, we give the first algorithm whose sample complexity only scales $\sum_{i=1}^mA_i$ where $A_i$ is the action size of the $i$-th player and $m$ is the number of players. In sharp contrast, the sample complexity of methods based on the point-wise bonus would scale with the size of the joint action space $\Pi_{i=1}^m A_i$ due to the curse of multiagents. Lastly, all of our algorithms can naturally take a pre-specified strategy class $\Pi$ as input and output a strategy that is close to the best strategy in $\Pi$. In this setting, the sample complexity only scales with $\log |\Pi|$ instead of $\sum_{i=1}^mA_i$.
Abstract（参考訳）: 本稿ではオフラインマルチエージェント強化学習について考察する。本研究では, 協調行動空間における各点の信頼区間を構成する点集中原則とは対照的に, 統合戦略の信頼区間を直接構築する戦略的集中原理を提案する。 2人のプレイヤーのゼロサムマルコフゲームに対して、戦略的なボーナスの凸性を利用して、サンプルの複雑さが従来のポイントワイドボーナスに基づく手法よりもアクションの数により良い依存を享受する計算効率の良いアルゴリズムを提案する。さらに、オフラインマルチエージェント汎用サムマルコフゲームでは、戦略的なボーナスと新しいサロゲート関数に基づいて、サンプル複雑性が$\sum_{i=1}^ma_i$ しかスケールしない最初のアルゴリズムを与え、ここで$a_i$は$i$-thプレーヤーのアクションサイズ、$m$はプレイヤー数とする。対照的に、ポイントワイドボーナスに基づくメソッドのサンプル複雑性は、マルチエージェントの呪いのため、ジョイントアクション空間 $\Pi_{i=1}^m A_i$ のサイズにスケールする。最後に、我々のアルゴリズムは、自然に指定された戦略クラスである$\Pi$を入力として取り、$\Pi$の最良の戦略に近い戦略を出力することができる。この設定では、サンプル複雑性は$\sum_{i=1}^mA_i$の代わりに$\log |\Pi|$でしかスケールしない。

論文の概要: Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus

関連論文リスト