Fugu-MT 論文翻訳(概要): On the Approximation of Cooperative Heterogeneous Multi-Agent Reinforcement Learning (MARL) using Mean Field Control (MFC)

論文の概要: On the Approximation of Cooperative Heterogeneous Multi-Agent Reinforcement Learning (MARL) using Mean Field Control (MFC)

arxiv url: http://arxiv.org/abs/2109.04024v1
Date: Thu, 9 Sep 2021 03:52:49 GMT
ステータス: 翻訳完了
システム内更新日: 2021-09-10 14:26:59.433330
Title: On the Approximation of Cooperative Heterogeneous Multi-Agent Reinforcement Learning (MARL) using Mean Field Control (MFC)
Title（参考訳）: 平均場制御(MFC)を用いた協調的異種マルチエージェント強化学習(MARL)の近似について
Authors: Washim Uddin Mondal, Mridul Agarwal, Vaneet Aggarwal, and Satish V. Ukkusuri
Abstract要約: 平均場制御(MFC)は協調型マルチエージェント強化学習問題の次元性の呪いを軽減する効果的な方法である。この作業では、$N_mathrmpop$ヘテロジニアスエージェントのコレクションを検討し、それを$K$クラスに分離することができる。
参考スコア（独自算出の注目度）: 33.833747074900856
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mean field control (MFC) is an effective way to mitigate the curse of dimensionality of cooperative multi-agent reinforcement learning (MARL) problems. This work considers a collection of $N_{\mathrm{pop}}$ heterogeneous agents that can be segregated into $K$ classes such that the $k$-th class contains $N_k$ homogeneous agents. We aim to prove approximation guarantees of the MARL problem for this heterogeneous system by its corresponding MFC problem. We consider three scenarios where the reward and transition dynamics of all agents are respectively taken to be functions of $(1)$ joint state and action distributions across all classes, $(2)$ individual distributions of each class, and $(3)$ marginal distributions of the entire population. We show that, in these cases, the $K$-class MARL problem can be approximated by MFC with errors given as $e_1=\mathcal{O}(\frac{\sqrt{|\mathcal{X}||\mathcal{U}|}}{N_{\mathrm{pop}}}\sum_{k}\sqrt{N_k})$, $e_2=\mathcal{O}(\sqrt{|\mathcal{X}||\mathcal{U}|}\sum_{k}\frac{1}{\sqrt{N_k}})$ and $e_3=\mathcal{O}\left(\sqrt{|\mathcal{X}||\mathcal{U}|}\left[\frac{A}{N_{\mathrm{pop}}}\sum_{k\in[K]}\sqrt{N_k}+\frac{B}{\sqrt{N_{\mathrm{pop}}}}\right]\right)$, respectively, where $A, B$ are some constants and $|\mathcal{X}|,|\mathcal{U}|$ are the sizes of state and action spaces of each agent. Finally, we design a Natural Policy Gradient (NPG) based algorithm that, in the three cases stated above, can converge to an optimal MARL policy within $\mathcal{O}(e_j)$ error with a sample complexity of $\mathcal{O}(e_j^{-3})$, $j\in\{1,2,3\}$, respectively.
Abstract（参考訳）: 平均場制御(MFC)は、協調型マルチエージェント強化学習(MARL)問題の次元性の呪いを軽減する効果的な方法である。この研究は、$k$-thクラスが$n_k$ 等質エージェントを含むような$k$クラスに分離できる$n_{\mathrm{pop}}$ヘテロジニアスエージェントの集合を考える。この不均一系に対するMARL問題の近似保証を対応するMFC問題によって証明することを目指している。すべてのエージェントの報酬とトランジションダイナミクスがそれぞれ、すべてのクラスにおける$(1)$合同状態とアクション分布、各クラスの$(2)$個別分布、および$(3)$全人口のマージン分布の関数として取られる3つのシナリオを検討した。 We show that, in these cases, the $K$-class MARL problem can be approximated by MFC with errors given as $e_1=\mathcal{O}(\frac{\sqrt{|\mathcal{X}||\mathcal{U}|}}{N_{\mathrm{pop}}}\sum_{k}\sqrt{N_k})$, $e_2=\mathcal{O}(\sqrt{|\mathcal{X}||\mathcal{U}|}\sum_{k}\frac{1}{\sqrt{N_k}})$ and $e_3=\mathcal{O}\left(\sqrt{|\mathcal{X}||\mathcal{U}|}\left[\frac{A}{N_{\mathrm{pop}}}\sum_{k\in[K]}\sqrt{N_k}+\frac{B}{\sqrt{N_{\mathrm{pop}}}}\right]\right)$, respectively, where $A, B$ are some constants and $|\mathcal{X}|,|\mathcal{U}|$ are the sizes of state and action spaces of each agent. 最後に、上記の3つのケースにおいて、それぞれ$\mathcal{O}(e_j)$エラーと$\mathcal{O}(e_j^{-3})$,$j\in\{1,2,3\}$のサンプル複雑性で最適なMARLポリシーに収束できる自然ポリシー勾配(NPG)ベースのアルゴリズムを設計する。

関連論文リスト

Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs [54.28273395444243]
我々は,モノトニック値 Omega (MVP) アルゴリズムが,差分を考慮した差分依存残差境界を$tildeOleft(left(sum_Delta_h(s,a)>0 fracH2 log K land MathttVar_maxtextc$。
論文参考訳（メタデータ） (2025-06-06T20:33:57Z)
On the $O(\rac{\sqrt{d}}{K^{1/4}})$ Convergence Rate of AdamW Measured by $\ell_1$ Norm [54.28350823319057]
本稿では、$ell_$ノルムで測定されたAdamWの収束率$frac1Ksum_k=1KEleft[|nabla f(xk)|_1right]leq O(fracsqrtdCK1/4)を確立する。
論文参考訳（メタデータ） (2025-05-17T05:02:52Z)
The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
この問題は通信複雑性のランダム化を$Omega(frac1kcdot n2log|mathbbF|)$とする。アプリケーションとして、$k$パスを持つ任意のストリーミングアルゴリズムに対して、$Omega(frac1kcdot n2log|mathbbF|)$スペースローバウンドを得る。
論文参考訳（メタデータ） (2024-10-26T06:21:42Z)
Variance-Reduced Fast Krasnoselkii-Mann Methods for Finite-Sum Root-Finding Problems [8.0153031008486]
有限和共役方程式 $Gx = 0$ を解くために, 分散還元を伴う高速クラスクラスKrasnoselkii-Mann 法を提案する。我々のアルゴリズムは単一ループであり、より広範なルートフィンディングアルゴリズムのために特別に設計された、偏りのない分散還元推定器の新たなファミリーを利用する。数値実験は我々のアルゴリズムを検証し、最先端の手法と比較して有望な性能を示す。
論文参考訳（メタデータ） (2024-06-04T15:23:29Z)
Provably learning a multi-head attention layer [55.2904547651831]
マルチヘッドアテンション層は、従来のフィードフォワードモデルとは分離したトランスフォーマーアーキテクチャの重要な構成要素の1つである。本研究では,ランダムな例から多面的注意層を実証的に学習する研究を開始する。最悪の場合、$m$に対する指数的依存は避けられないことを示す。
論文参考訳（メタデータ） (2024-02-06T15:39:09Z)
Mean-Field Control based Approximation of Multi-Agent Reinforcement Learning in Presence of a Non-decomposable Shared Global State [37.63373979256335]
平均場制御(MFC)は、大規模マルチエージェント強化学習(MARL)問題を解決するための強力な近似ツールである。ここでは、エージェントが共通のグローバル状態を共有するMARL設定であっても、MFCは優れた近似ツールとして適用可能であることを実証する。
論文参考訳（メタデータ） (2023-01-13T18:55:58Z)
Additive estimates of the permanent using Gaussian fields [0.0]
本稿では,M$M$実行列$A$から加法誤差までの永久性を推定するランダム化アルゴリズムを提案する。我々は$mathrmperm(A)$を$epsilonbigg(sqrt32Mprod2M_i=1 C_iibigg)$の加算誤差に時間内に見積もることができる。
論文参考訳（メタデータ） (2022-12-20T22:13:42Z)
Near Sample-Optimal Reduction-based Policy Learning for Average Reward MDP [58.13930707612128]
この研究は、平均報酬マルコフ決定過程(AMDP)における$varepsilon$-Optimal Policyを得る際のサンプルの複雑さを考察する。我々は、状態-作用対当たりの$widetilde O(H varepsilon-3 ln frac1delta)$サンプルを証明し、$H := sp(h*)$は任意の最適ポリシーのバイアスのスパンであり、$varepsilon$は精度、$delta$は失敗確率である。
論文参考訳（メタデータ） (2022-12-01T15:57:58Z)
Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL) [35.18639326270473]
制約が存在する場合でも, MFC を用いて MARL 問題を近似できることを示す。また、Natural Policy Gradientベースのアルゴリズムを提供し、$mathcalO(e)$の誤差で制限されたMARL問題を、$mathcalO(e-6)$の複雑さで解くことができることを示す。
論文参考訳（メタデータ） (2022-09-15T16:33:38Z)
On the Near-Optimality of Local Policies in Large Cooperative Multi-Agent Reinforcement Learning [37.63373979256335]
協調的な$N$エージェントネットワークでは、エージェントに対してローカルに実行可能なポリシーを設計できることを示す。また,ローカルポリシーを明示的に構築するアルゴリズムも考案した。
論文参考訳（メタデータ） (2022-09-07T23:15:08Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
モノトン活性化に対する $mathbfxmapstosigma(mathbfwcdotmathbfx)$ の関数について検討する。学習者の目標は仮説ベクトル $mathbfw$ that $F(mathbbw)=C, epsilon$ を高い確率で出力することである。
論文参考訳（メタデータ） (2022-06-17T17:55:43Z)
Can Mean Field Control (MFC) Approximate Cooperative Multi Agent Reinforcement Learning (MARL) with Non-Uniform Interaction? [33.484960394599455]
MFC(Mean-Field Control)は,MARL(Multi-Agent Reinforcement)問題を解決する強力なツールである。本稿では、交換可能性の仮定を緩和し、任意の二重行列を介してエージェント間の相互作用をモデル化する。各エージェントの報酬が、そのエージェントが見た平均場のアフィン関数であるなら、そのような一様でないMARL問題を近似することができる。
論文参考訳（メタデータ） (2022-02-28T19:03:09Z)
Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
ガウス境界の下でのマスアートノイズ付きmathbbRd$におけるPAC学習ハーフスペースの問題について検討する。この結果は,Massartモデルにおける学習ハーフスペースの複雑さを定性的に特徴づけるものである。
論文参考訳（メタデータ） (2021-08-19T16:16:48Z)
Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals [49.60752558064027]
ガウス境界の下では、半空間とReLUを不可知的に学習する基本的な問題について検討する。我々の下限は、これらのタスクの現在の上限が本質的に最良のものであるという強い証拠を与える。
論文参考訳（メタデータ） (2020-06-29T17:10:10Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。