Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field
Control/Game in Continuous Time
- URL: http://arxiv.org/abs/2008.06845v1
- Date: Sun, 16 Aug 2020 06:34:11 GMT
- Title: Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field
Control/Game in Continuous Time
- Authors: Weichen Wang, Jiequn Han, Zhuoran Yang and Zhaoran Wang
- Abstract summary: We study the policy gradient method for the linear-quadratic mean-field control and game.
We show that it converges to the optimal solution at a linear rate, which is verified by a synthetic simulation.
- Score: 109.06623773924737
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning is a powerful tool to learn the optimal policy of
possibly multiple agents by interacting with the environment. As the number of
agents grow to be very large, the system can be approximated by a mean-field
problem. Therefore, it has motivated new research directions for mean-field
control (MFC) and mean-field game (MFG). In this paper, we study the policy
gradient method for the linear-quadratic mean-field control and game, where we
assume each agent has identical linear state transitions and quadratic cost
functions. While most of the recent works on policy gradient for MFC and MFG
are based on discrete-time models, we focus on the continuous-time models where
some analyzing techniques can be interesting to the readers. For both MFC and
MFG, we provide policy gradient update and show that it converges to the
optimal solution at a linear rate, which is verified by a synthetic simulation.
For MFG, we also provide sufficient conditions for the existence and uniqueness
of the Nash equilibrium.
Related papers
- Full error analysis of policy gradient learning algorithms for exploratory linear quadratic mean-field control problem in continuous time with common noise [0.0]
We study policy gradient (PG) learning and first demonstrate convergence in a model-based setting.
We prove the global linear convergence and sample complexity of the PG algorithm with two-point gradient estimates in a model-free setting.
In this setting, the parameterized optimal policies are learned from samples of the states and population distribution.
arXiv Detail & Related papers (2024-08-05T14:11:51Z) - A Single Online Agent Can Efficiently Learn Mean Field Games [16.00164239349632]
Mean field games (MFGs) are a promising framework for modeling the behavior of large-population systems.
This paper introduces a novel online single-agent model-free learning scheme, which enables a single agent to learn MFNE using online samples.
arXiv Detail & Related papers (2024-05-05T16:38:04Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Deep Reinforcement Learning for Infinite Horizon Mean Field Problems in Continuous Spaces [1.4999444543328293]
We present a reinforcement learning (RL) algorithm designed to solve mean field games (MFG) and mean field control (MFC) problems in a unified manner.
The proposed approach pairs the actor-critic (AC) paradigm with a representation of the mean field distribution via a parameterized score function.
A modification of the algorithm allows us to solve mixed mean field control games (MFCGs)
arXiv Detail & Related papers (2023-09-19T22:37:47Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Regularization of the policy updates for stabilizing Mean Field Games [0.2348805691644085]
This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL)
MARL where multiple agents interact in the same environment and whose goal is to maximize the individual returns.
We name our algorithm Mean Field Proximal Policy Optimization (MF-PPO), and we empirically show the effectiveness of our method in the OpenSpiel framework.
arXiv Detail & Related papers (2023-04-04T05:45:42Z) - Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games.
We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method.
Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a
Finite Horizon [3.867363075280544]
We explore reinforcement learning methods for finding the optimal policy in the linear quadratic regulator (LQR) problem.
We produce a global linear convergence guarantee for the setting of finite time horizon and state dynamics under weak assumptions.
We show results for the case where we assume a model for the underlying dynamics and where we apply the method to the data directly.
arXiv Detail & Related papers (2020-11-20T09:51:49Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.