Fugu-MT 論文翻訳(概要): Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

論文の概要: Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

arxiv url: http://arxiv.org/abs/2102.04540v1
Date: Mon, 8 Feb 2021 21:45:56 GMT
ステータス: 翻訳完了
システム内更新日: 2021-02-10 14:46:56.939659
Title: Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games
Title（参考訳）: Infinite-horizon Competitive Markov Gamesにおける分散型オプティマティカルグラデーションの晩期収束
Authors: Chen-Yu Wei, Chung-Wei Lee, Mengxiao Zhang, Haipeng Luo
Abstract要約: 無限水平割引2プレイヤーゼロサムマルコフゲームについて検討する。我々は,自己再生下でのナッシュ均衡に収束する分散アルゴリズムを開発した。
参考スコア（独自算出の注目度）: 37.70703888365849
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: We study infinite-horizon discounted two-player zero-sum Markov games, and develop a decentralized algorithm that provably converges to the set of Nash equilibria under self-play. Our algorithm is based on running an Optimistic Gradient Descent Ascent algorithm on each state to learn the policies, with a critic that slowly learns the value of each state. To the best of our knowledge, this is the first algorithm in this setting that is simultaneously rational (converging to the opponent's best response when it uses a stationary policy), convergent (converging to the set of Nash equilibria under self-play), agnostic (no need to know the actions played by the opponent), symmetric (players taking symmetric roles in the algorithm), and enjoying a finite-time last-iterate convergence guarantee, all of which are desirable properties of decentralized algorithms.
Abstract（参考訳）: 無限ホライゾン割引2人のゼロサムマルコフゲームを研究し、自己プレイ下でnash平衡の集合に確実に収束する分散アルゴリズムを開発した。提案アルゴリズムは,各状態に対して最適勾配Descent Ascentアルゴリズムを実行してポリシを学習し,各状態の価値を徐々に学習する批評家を対象とする。 To the best of our knowledge, this is the first algorithm in this setting that is simultaneously rational (converging to the opponent's best response when it uses a stationary policy), convergent (converging to the set of Nash equilibria under self-play), agnostic (no need to know the actions played by the opponent), symmetric (players taking symmetric roles in the algorithm), and enjoying a finite-time last-iterate convergence guarantee, all of which are desirable properties of decentralized algorithms.

関連論文リスト

Independent Learning in Constrained Markov Potential Games [19.083595175045073]
制約付きマルコフゲームは、マルチエージェント強化学習問題をモデル化するための正式なフレームワークを提供する。近似的制約付きナッシュ平衡を学習するための独立ポリシー勾配アルゴリズムを提案する。
論文参考訳（メタデータ） (2024-02-27T20:57:35Z)
Learning Nash Equilibria in Zero-Sum Markov Games: A Single Time-scale Algorithm Under Weak Reachability [11.793922711718645]
我々は,ゼロサムゲームにおいて,プレイヤーが情報のみを閲覧し,相手の行動や支払いを行うような分散学習を検討する。従来の研究は、強い到達可能性仮定の下で二重時間スケールのアルゴリズムを用いて、この設定でナッシュ均衡に収束することを示した。我々の貢献は合理的で収束したアルゴリズムであり、Tsallis-Entropy regularization を値イテレーションに基づくアルゴリズムで利用している。
論文参考訳（メタデータ） (2023-12-13T09:31:30Z)
Decentralized model-free reinforcement learning in stochastic games with average-reward objective [1.9852463786440127]
本アルゴリズムは,次数$T3/4$のサブ線形高確率後悔と次数$T2/3$のサブ線形高確率後悔を実現する。本アルゴリズムは,従来の手法に比べて計算量が少なく,メモリスペースも少ない。
論文参考訳（メタデータ） (2023-01-13T15:59:53Z)
Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions [145.54544979467872]
本研究では,ゼロサムマルコフゲームに対して,構造的だが未知の遷移を伴う架空のプレイポリシー最適化アルゴリズムを提案し,解析する。我々は、2年制の競争ゲームシナリオで、$K$のエピソードに続き、$widetildemathcalO(sqrtK)$ regret boundsを証明した。提案アルゴリズムは,アッパー信頼境界(UCB)型最適化と,同時政策最適化の範囲内での架空のプレイの組み合わせを特徴とする。
論文参考訳（メタデータ） (2022-07-25T18:29:16Z)
Policy Optimization for Markov Games: Unified Framework and Faster Convergence [81.3266426402464]
このアルゴリズムのステートワイド平均ポリシはゲームの近似ナッシュ平衡(NE)に収束することを示す。このアルゴリズムをマルチプレイヤー一般のMarkov Gamesに拡張し、CCE(Correlated Equilibria)への$mathcalwidetildeO(T-1/2)$収束率を示す。
論文参考訳（メタデータ） (2022-06-06T14:23:13Z)
Towards convergence to Nash equilibria in two-team zero-sum games [17.4461045395989]
2チームゼロサムゲームは、プレイヤーが2つの競合するエージェントに分割されるマルチプレイヤーゲームとして定義される。我々はNash equilibria(NE)の解の概念に焦点をあてる。このクラスのゲームに対する計算 NE は、複雑性クラス $mathrm$ に対して $textithard$ であることを示す。
論文参考訳（メタデータ） (2021-11-07T21:15:35Z)
Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation [92.99933928528797]
同時動作による2プレイヤーゼロサムマルコフゲームの強化学習について検討した。我々は,「不確かさの最適性」に基づくアルゴリズムナッシュ-UCRL-VTRを提案する。我々は、Nash-UCRL-VTR が $tildeO(dHsqrtT)$ regret を確実に達成できることを示し、$d$ は線型関数次元である。
論文参考訳（メタデータ） (2021-02-15T09:09:16Z)
Provable Fictitious Play for General Mean-Field Games [111.44976345867005]
静止平均場ゲームのための強化学習アルゴリズムを提案する。目標は、ナッシュ均衡を構成する平均場状態と定常政策のペアを学ぶことである。
論文参考訳（メタデータ） (2020-10-08T18:46:48Z)
Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium [116.56359444619441]
両プレイヤーのゼロサム有限ホライゾンマルコフゲームに対する効率の良い強化学習アルゴリズムを開発した。オフライン環境では、両プレイヤーを制御し、双対性ギャップを最小化してナッシュ平衡を求める。オンライン環境では、任意の相手と対戦する1人のプレイヤーを制御し、後悔を最小限に抑える。
論文参考訳（メタデータ） (2020-02-17T17:04:16Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。