Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning
with Parameter Convergence
- URL: http://arxiv.org/abs/2210.12812v2
- Date: Mon, 20 Mar 2023 13:56:49 GMT
- Title: Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning
with Parameter Convergence
- Authors: Sarath Pattathil, Kaiqing Zhang, Asuman Ozdaglar
- Abstract summary: We investigate the global convergence of natural policy gradient approximation variants in multi-agent learning.
We propose an algorithm for several standard multi-agent learning scenarios.
- Score: 18.412945308419033
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-agent interactions are increasingly important in the context of
reinforcement learning, and the theoretical foundations of policy gradient
methods have attracted surging research interest. We investigate the global
convergence of natural policy gradient (NPG) algorithms in multi-agent
learning. We first show that vanilla NPG may not have parameter convergence,
i.e., the convergence of the vector that parameterizes the policy, even when
the costs are regularized (which enabled strong convergence guarantees in the
policy space in the literature). This non-convergence of parameters leads to
stability issues in learning, which becomes especially relevant in the function
approximation setting, where we can only operate on low-dimensional parameters,
instead of the high-dimensional policy. We then propose variants of the NPG
algorithm, for several standard multi-agent learning scenarios: two-player
zero-sum matrix and Markov games, and multi-player monotone games, with global
last-iterate parameter convergence guarantees. We also generalize the results
to certain function approximation settings. Note that in our algorithms, the
agents take symmetric roles. Our results might also be of independent interest
for solving nonconvex-nonconcave minimax optimization problems with certain
structures. Simulations are also provided to corroborate our theoretical
findings.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - High-probability sample complexities for policy evaluation with linear function approximation [88.87036653258977]
We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms.
We establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level.
arXiv Detail & Related papers (2023-05-30T12:58:39Z) - Local Optimization Achieves Global Optimality in Multi-Agent
Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z) - Smoothing Policy Iteration for Zero-sum Markov Games [9.158672246275348]
We propose the smoothing policy robustness (SPI) algorithm to solve the zero-sum MGs approximately.
Specially, the adversarial policy is served as the weight function to enable an efficient sampling over action spaces.
We also propose a model-based algorithm called Smooth adversarial Actor-critic (SaAC) by extending SPI with the function approximations.
arXiv Detail & Related papers (2022-12-03T14:39:06Z) - Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games.
We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method.
Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z) - Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs [21.347689976296834]
We employ the natural policy gradient method to solve the discounted optimal optimal rate problem.
We also provide convergence and finite-sample guarantees for two sample-based NPG-PD algorithms.
arXiv Detail & Related papers (2022-06-06T04:28:04Z) - Dimensionality Reduction and Prioritized Exploration for Policy Search [29.310742141970394]
Black-box policy optimization is a class of reinforcement learning algorithms that explores and updates the policies at the parameter level.
We present a novel method to prioritize the exploration of effective parameters and cope with full covariance matrix updates.
Our algorithm learns faster than recent approaches and requires fewer samples to achieve state-of-the-art results.
arXiv Detail & Related papers (2022-03-09T15:17:09Z) - On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces [23.186300629667134]
We study the convergence of policy gradient algorithms under heavy-tailed parameterizations.
Our main theoretical contribution is the establishment that this scheme converges with constant step and batch sizes.
arXiv Detail & Related papers (2022-01-28T18:54:30Z) - A general sample complexity analysis of vanilla policy gradient [101.16957584135767]
Policy gradient (PG) is one of the most popular reinforcement learning (RL) problems.
"vanilla" theoretical understanding of PG trajectory is one of the most popular methods for solving RL problems.
arXiv Detail & Related papers (2021-07-23T19:38:17Z) - Computational Performance of Deep Reinforcement Learning to find Nash
Equilibria [0.0]
We use a deep reinforcement learning algorithm to learn Nash equilibria in a setting where firms compete in prices.
Despite being model-free, a large set of parameters are utilized in various steps of the algorithm.
We find parameter choices that can reach convergence rates of up to 99%.
arXiv Detail & Related papers (2021-04-26T22:14:17Z) - Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field
Control/Game in Continuous Time [109.06623773924737]
We study the policy gradient method for the linear-quadratic mean-field control and game.
We show that it converges to the optimal solution at a linear rate, which is verified by a synthetic simulation.
arXiv Detail & Related papers (2020-08-16T06:34:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.