Fugu-MT 論文翻訳(概要): Understanding Value Decomposition Algorithms in Deep Cooperative Multi-Agent Reinforcement Learning

論文の概要: Understanding Value Decomposition Algorithms in Deep Cooperative Multi-Agent Reinforcement Learning

arxiv url: http://arxiv.org/abs/2202.04868v1
Date: Thu, 10 Feb 2022 06:59:08 GMT
ステータス: 翻訳完了
システム内更新日: 2022-02-12 03:52:18.885730
Title: Understanding Value Decomposition Algorithms in Deep Cooperative Multi-Agent Reinforcement Learning
Title（参考訳）: Deep Cooperative Multi-Agent Reinforcement Learningにおける値分解アルゴリズムの理解
Authors: Zehao Dou, Jakub Grudzien Kuba, Yaodong Yang
Abstract要約: 本稿では,値分解手法が有効性を見出すような協調ゲームについて紹介する。分解可能なゲームでは、マルチエージェント適合Q-Iterationアルゴリズム (MA-FQI) を適用すると最適なQ-函数が得られることを理論的に証明する。非分解可能ゲームにおいて、MA-FQI による推定 Q-函数は、Q-函数が各反復で分解可能函数空間に射影する必要がある状況下でも最適に収束することができる。
参考スコア（独自算出の注目度）: 4.127975260293993
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Value function decomposition is becoming a popular rule of thumb for scaling up multi-agent reinforcement learning (MARL) in cooperative games. For such a decomposition rule to hold, the assumption of the individual-global max (IGM) principle must be made; that is, the local maxima on the decomposed value function per every agent must amount to the global maximum on the joint value function. This principle, however, does not have to hold in general. As a result, the applicability of value decomposition algorithms is concealed and their corresponding convergence properties remain unknown. In this paper, we make the first effort to answer these questions. Specifically, we introduce the set of cooperative games in which the value decomposition methods find their validity, which is referred as decomposable games. In decomposable games, we theoretically prove that applying the multi-agent fitted Q-Iteration algorithm (MA-FQI) will lead to an optimal Q-function. In non-decomposable games, the estimated Q-function by MA-FQI can still converge to the optimum under the circumstance that the Q-function needs projecting into the decomposable function space at each iteration. In both settings, we consider value function representations by practical deep neural networks and derive their corresponding convergence rates. To summarize, our results, for the first time, offer theoretical insights for MARL practitioners in terms of when value decomposition algorithms converge and why they perform well.
Abstract（参考訳）: 協調ゲームにおけるマルチエージェント強化学習(MARL)のスケールアップにおいて,値関数の分解が親指の一般的な規則になりつつある。このような分解規則を成立させるためには、個々のグローバルmax(igm)原理の仮定、すなわち各エージェント毎の分解値関数の局所最大値はジョイント値関数のグローバル最大値に相当しなければならない。しかし、この原則は一般には成立しない。その結果、値分解アルゴリズムの適用性は隠蔽され、対応する収束特性は未知のままである。本稿では,これらの質問に答える最初の試みを行う。具体的には、値分解手法が妥当性を見出すような協調ゲーム群を紹介し、これを分解可能なゲームと呼ぶ。分解可能なゲームでは、マルチエージェント適合Q-Iterationアルゴリズム (MA-FQI) を適用すると最適なQ-函数が得られることを理論的に証明する。非可逆ゲームでは、ma-fqiによる推定q関数は、q関数が各イテレーションで可逆関数空間に射影する必要がある状況下でも最適に収束することができる。どちらの設定でも、実用的な深層ニューラルネットワークによる値関数表現を検討し、対応する収束率を導出する。まとめると、我々の結果は初めて、値分解アルゴリズムがいつ収束するか、なぜうまく機能するのかという観点から、MARL実践者に理論的洞察を提供する。

関連論文リスト

Reward Adaptation Via Q-Manipulation [3.8065968624597324]
本稿では、学習エージェントが1つまたは複数の既存行動に基づいて目標報酬関数に適応する問題である報酬適応(RA)に対する新しい解決策を提案する。我々の研究は、Q-関数の操作によるRAに対する新しいアプローチを表している。 Q-Manipulation (Q-M) と呼ばれる手法について述べる。
論文参考訳（メタデータ） (2025-03-17T17:42:54Z)
QFree: A Universal Value Function Factorization for Multi-Agent Reinforcement Learning [2.287186762346021]
マルチエージェント強化学習のための共通値関数分解法QFreeを提案する。汎用複雑なMARLベンチマーク環境において,QFreeが最先端性能を実現することを示す。
論文参考訳（メタデータ） (2023-11-01T08:07:16Z)
Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback [75.29048190099523]
オンライン勾配降下(OGD)は、強い凸性や単調性仮定の下では二重最適であることが知られている。本稿では,これらのパラメータの事前知識を必要としない完全適応型OGDアルゴリズム,textsfAdaOGDを設計する。
論文参考訳（メタデータ） (2023-10-21T18:38:13Z)
Greedy based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning [64.05646120624287]
LVDとMVDの結合Q値関数を導出する。最適な整合性を確保するために、最適なノードは独自のSTNである必要がある。本手法は,様々なベンチマーク実験において,最先端のベースラインよりも優れた性能を示す。
論文参考訳（メタデータ） (2022-11-22T08:14:50Z)
Non-Linear Coordination Graphs [22.29517436920317]
座標グラフ(CG)は、ペアのペイオフ関数を組み込んだ高次分解を表す。 CG値の分解を線形の場合を超えて拡張することにより、最初の非線形座標グラフを提案する。提案手法は,MACOのようなマルチエージェント協調タスクにおいて,優れた性能を実現することができる。
論文参考訳（メタデータ） (2022-10-26T18:11:31Z)
Min-Max Bilevel Multi-objective Optimization with Applications in Machine Learning [30.25074797092709]
本稿では,min-maxバイレベル多目的最適化フレームワークを提案する。表現学習と超目的学習の応用を強調している。
論文参考訳（メタデータ） (2022-03-03T18:56:13Z)
Better Regularization for Sequential Decision Spaces: Fast Convergence Rates for Nash, Correlated, and Team Equilibria [121.36609493711292]
大規模2プレーヤワイドフォームゲームの計算平衡問題に対する反復的な一階法の適用について検討する。正則化器を用いて一階法をインスタンス化することにより、相関平衡と元アンティー座標のチーム平衡を計算するための最初の加速一階法を開発する。
論文参考訳（メタデータ） (2021-05-27T06:10:24Z)
Understanding and Accelerating EM Algorithm's Convergence by Fair Competition Principle and Rate-Verisimilitude Function [0.40611352512781856]
本稿では,異なる収束困難を説明するために婚姻競争を利用し,フェアコンペティション原則(FCP)を提案する。この収束証明はシャノンらによる変分的および反復的手法を採用する。速度歪み関数の分析に使用される。
論文参考訳（メタデータ） (2021-04-21T20:27:25Z)
Learning Aggregation Functions [78.47770735205134]
任意の濃度の集合に対する学習可能なアグリゲータであるLAF(Learning Aggregation Function)を紹介する。半合成および実データを用いて,LAFが最先端の和(max-)分解アーキテクチャより優れていることを示す実験を報告する。
論文参考訳（メタデータ） (2020-12-15T18:28:53Z)
Multi-agent Policy Optimization with Approximatively Synchronous Advantage Estimation [55.96893934962757]
マルチエージェントシステムでは、異なるエージェントの警察を共同で評価する必要がある。現在の方法では、バリュー関数やアドバンテージ関数は非同期に評価される対実関節アクションを使用する。本研究では,近似的に同期する利点推定を提案する。
論文参考訳（メタデータ） (2020-12-07T07:29:19Z)
A Multi-Agent Primal-Dual Strategy for Composite Optimization over Distributed Features [52.856801164425086]
目的関数を滑らかな局所関数と凸(おそらく非滑らか)結合関数の和とするマルチエージェント共有最適化問題について検討する。
論文参考訳（メタデータ） (2020-06-15T19:40:24Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。