Fugu-MT 論文翻訳(概要): A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning

論文の概要: A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning

arxiv url: http://arxiv.org/abs/2110.15092v1
Date: Wed, 27 Oct 2021 08:01:17 GMT
ステータス: 翻訳完了
システム内更新日: 2021-10-31 08:48:52.709312
Title: A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning
Title（参考訳）: マルチエージェント強化学習における反復対数法則
Authors: Gugan Thoppe, Bhumesh Kumar
Abstract要約: マルチエージェント強化学習(MARL: Multi-Agent Reinforcement Learning)では、複数のエージェントが共通の環境と相互作用し、シーケンシャルな意思決定において共有問題を解く。我々は、MARLで有用な分散非線形近似スキームの族を反復する新しい法則を導出する。
参考スコア（独自算出の注目度）: 3.655021726150368
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: In Multi-Agent Reinforcement Learning (MARL), multiple agents interact with a common environment, as also with each other, for solving a shared problem in sequential decision-making. It has wide-ranging applications in gaming, robotics, finance, etc. In this work, we derive a novel law of iterated logarithm for a family of distributed nonlinear stochastic approximation schemes that is useful in MARL. In particular, our result describes the convergence rate on almost every sample path where the algorithm converges. This result is the first of its kind in the distributed setup and provides deeper insights than the existing ones, which only discuss convergence rates in the expected or the CLT sense. Importantly, our result holds under significantly weaker assumptions: neither the gossip matrix needs to be doubly stochastic nor the stepsizes square summable. As an application, we show that, for the stepsize $n^{-\gamma}$ with $\gamma \in (0, 1),$ the distributed TD(0) algorithm with linear function approximation has a convergence rate of $O(\sqrt{n^{-\gamma} \ln n })$ a.s.; for the $1/n$ type stepsize, the same is $O(\sqrt{n^{-1} \ln \ln n})$ a.s. These decay rates do not depend on the graph depicting the interactions among the different agents.
Abstract（参考訳）: マルチエージェント強化学習(MARL: Multi-Agent Reinforcement Learning)では、複数のエージェントが共通の環境と相互作用し、シーケンシャルな意思決定において共有問題を解く。ゲーム、ロボティクス、ファイナンスなどの分野に幅広く応用されている。本研究では, marl において有用な分散非線形確率近似スキームの族に対する反復対数の新しい法則を導出する。特にこの結果は,アルゴリズムが収束するほぼすべてのサンプルパス上の収束率を記述する。この結果は分散セットアップにおける最初のものであり、期待やCLTの意味での収束率のみを議論する既存のものよりも深い洞察を提供する。重要なことに、我々の結果はより弱い仮定で成り立つ: ゴシップ行列は二重確率的でも、段数乗算可能でもない。応用として、次数$n^{-\gamma}$と$\gamma \in (0, 1)に対して、線形関数近似を持つ分散TD(0)アルゴリズムは、$O(\sqrt{n^{-\gamma} \ln n })$ a.s.; 1/n$型ステップサイズの場合、$O(\sqrt{n^{-1} \ln \ln n})$ a.s. これらの崩壊率は、異なるエージェント間の相互作用を記述するグラフに依存しない。

関連論文リスト

Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
離散部分モジュラー問題を連続的に最適化するために,$textbfMA-OSMA$アルゴリズムを提案する。また、一様分布を混合することによりKLの発散を効果的に活用する、プロジェクションフリーな$textbfMA-OSEA$アルゴリズムも導入する。我々のアルゴリズムは最先端OSGアルゴリズムによって提供される$(frac11+c)$-approximationを大幅に改善する。
論文参考訳（メタデータ） (2025-02-07T15:57:56Z)
Convergence of Unadjusted Langevin in High Dimensions: Delocalization of Bias [13.642712817536072]
問題の次元が$d$になるにつれて、所望の誤差内で収束を保証するのに必要なイテレーションの数が増加することを示す。私たちが取り組んだ重要な技術的課題は、収束を測定するための$W_2,ellinfty$メートル法に一段階の縮約性がないことである。
論文参考訳（メタデータ） (2024-08-20T01:24:54Z)
Rate Analysis of Coupled Distributed Stochastic Approximation for Misspecified Optimization [0.552480439325792]
パラメトリックな特徴を持つ不完全な情報を持つ分散最適化問題として$n$のエージェントを考える。本稿では,各エージェントが未知パラメータの現在の信念を更新する分散近似アルゴリズムを提案する。アルゴリズムの性能に影響を与える因子を定量的に解析し、決定変数の平均二乗誤差が$mathcalO(frac1nk)+mathcalOleft(frac1sqrtn (1-rho_w)right)frac1k1.5で有界であることを証明する。
論文参考訳（メタデータ） (2024-04-21T14:18:49Z)
Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation [53.17668583030862]
一般関数近似の文脈において,無限水平平均逆マルコフ決定過程(AMDP)について検討する。最適化最適化(LOOP)と呼ばれる新しいアルゴリズムフレームワークを提案する。我々は LOOP がサブ線形 $tildemathcalO(mathrmpoly(d, mathrmsp(V*)) sqrtTbeta )$ regret を達成することを示す。
論文参考訳（メタデータ） (2024-04-19T06:24:22Z)
Refined Sample Complexity for Markov Games with Independent Linear Function Approximation [49.5660193419984]
マルコフゲーム(MG)はマルチエージェント強化学習(MARL)の重要なモデルである本稿では、WangらによるAVLPRフレームワークを改良し(2023年)、最適部分ギャップの悲観的推定を設計する。マルチエージェントの呪いに取り組み、最適な$O(T-1/2)収束率を達成し、同時に$textpoly(A_max)$依存性を避ける最初のアルゴリズムを与える。
論文参考訳（メタデータ） (2024-02-11T01:51:15Z)
Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning [9.31522898261934]
機械学習の勾配アルゴリズムに対する圧縮の影響について検討する。いくつかの非バイアス圧縮演算子間の収束率の差を強調した。我々はその結果を連合学習の事例にまで拡張する。
論文参考訳（メタデータ） (2023-08-02T18:02:00Z)
Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning [77.22019100456595]
通信周波数の異なる分散計算作業者のトレーニングアルゴリズムを示す。本研究では,より厳密な収束率を$mathcalO!!(sigma2-2_avg!)とする。また,不均一性の項は,作業者の平均遅延によっても影響されることを示した。
論文参考訳（メタデータ） (2022-06-16T17:10:57Z)
Provably Efficient Convergence of Primal-Dual Actor-Critic with Nonlinear Function Approximation [15.319335698574932]
The first efficient convergence result with primal-dual actor-critic with a convergence of $mathcalOleft ascent(Nright)Nright)$ under Polyian sample。 Open GymAI連続制御タスクの結果。
論文参考訳（メタデータ） (2022-02-28T15:16:23Z)
Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes [91.38793800392108]
本稿では,マルコフ決定過程(MDP)の遷移確率核が線形混合モデルである線形関数近似による強化学習について検討する。上記の線形混合 MDP に対して$textUCRL-VTR+$ という線形関数近似を用いた計算効率の良い新しいアルゴリズムを提案する。我々の知る限り、これらは線形関数近似を持つRLのための計算効率が良く、ほぼ最小のアルゴリズムである。
論文参考訳（メタデータ） (2020-12-15T18:56:46Z)
Convergence of Sparse Variational Inference in Gaussian Processes Regression [29.636483122130027]
計算コストが$mathcalO(log N)2D(log N)2)$の手法を推論に利用できることを示す。
論文参考訳（メタデータ） (2020-08-01T19:23:34Z)
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction [63.41789556777387]
非同期Q-ラーニングはマルコフ決定過程(MDP)の最適行動値関数(またはQ-関数)を学習することを目的としている。 Q-関数の入出力$varepsilon$-正確な推定に必要なサンプルの数は、少なくとも$frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$の順である。
論文参考訳（メタデータ） (2020-06-04T17:51:00Z)
Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling [56.394284787780364]
本稿では、ポリシー勾配(PG)と時間差(TD)学習の2つの基本RLアルゴリズムに対して、最初の理論的収束解析を行う。一般の非線形関数近似の下では、PG-AMSGradは定常点の近傍に収束し、$mathcalO(log T/sqrtT)$である。線形関数近似の下では、一定段階のTD-AMSGradは$mathcalO(log T/sqrtT)の速度で大域的最適化の近傍に収束する。
論文参考訳（メタデータ） (2020-02-15T00:26:49Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。