Policy Optimization for Linear-Quadratic Zero-Sum Mean-Field Type Games
- URL: http://arxiv.org/abs/2009.02146v1
- Date: Wed, 2 Sep 2020 13:49:08 GMT
- Title: Policy Optimization for Linear-Quadratic Zero-Sum Mean-Field Type Games
- Authors: Ren\'e Carmona and Kenza Hamidouche and Mathieu Lauri\`ere and Zongjun
Tan
- Abstract summary: zero-sum mean-field type games (ZSMFTG) with linear dynamics and quadratic utility are studied.
Two policy optimization methods that rely on policy gradient are proposed.
- Score: 1.1852406625172216
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, zero-sum mean-field type games (ZSMFTG) with linear dynamics
and quadratic utility are studied under infinite-horizon discounted utility
function. ZSMFTG are a class of games in which two decision makers whose
utilities sum to zero, compete to influence a large population of agents. In
particular, the case in which the transition and utility functions depend on
the state, the action of the controllers, and the mean of the state and the
actions, is investigated. The game is analyzed and explicit expressions for the
Nash equilibrium strategies are derived. Moreover, two policy optimization
methods that rely on policy gradient are proposed for both model-based and
sample-based frameworks. In the first case, the gradients are computed exactly
using the model whereas they are estimated using Monte-Carlo simulations in the
second case. Numerical experiments show the convergence of the two players'
controls as well as the utility function when the two algorithms are used in
different scenarios.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Neural Time-Reversed Generalized Riccati Equation [60.92253836775246]
Hamiltonian equations offer an interpretation of optimality through auxiliary variables known as costates.
This paper introduces a novel neural-based approach to optimal control, with the aim of working forward-in-time.
arXiv Detail & Related papers (2023-12-14T19:29:37Z) - Dimensionless Policies based on the Buckingham $\pi$ Theorem: Is This a
Good Way to Generalize Numerical Results? [66.52698983694613]
This article explores the use of the Buckingham $pi$ theorem as a tool to encode the control policies of physical systems into a generic form of knowledge.
We show, by restating the solution to a motion control problem using dimensionless variables, that (1) the policy mapping involves a reduced number of parameters and (2) control policies generated numerically for a specific system can be transferred exactly to a subset of dimensionally similar systems by scaling the input and output variables appropriately.
It remains to be seen how practical this approach can be to generalize policies for more complex high-dimensional problems, but the early results show that it is a
arXiv Detail & Related papers (2023-07-29T00:51:26Z) - HSVI can solve zero-sum Partially Observable Stochastic Games [7.293053431456775]
State-of-the-art methods for solving 2-player zero-sum imperfect games rely on linear programming or dynamic regret minimization.
We propose a novel family of promising approaches complementing those relying on linear programming or iterative methods.
arXiv Detail & Related papers (2022-10-26T11:41:57Z) - Provably Efficient Fictitious Play Policy Optimization for Zero-Sum
Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions.
We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario.
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z) - Policy Gradient and Actor-Critic Learning in Continuous Time and Space:
Theory and Algorithms [1.776746672434207]
We study policy gradient (PG) for reinforcement learning in continuous time and space.
We propose two types of the actor-critic algorithms for RL, where we learn and update value functions and policies simultaneously and alternatingly.
arXiv Detail & Related papers (2021-11-22T14:27:04Z) - Provably Efficient Policy Gradient Methods for Two-Player Zero-Sum
Markov Games [95.70078702838654]
This paper studies natural extensions of Natural Policy Gradient algorithm for solving two-player zero-sum games.
We thoroughly characterize the algorithms' performance in terms of the number of samples, number of iterations, concentrability coefficients, and approximation error.
arXiv Detail & Related papers (2021-02-17T17:49:57Z) - Linear-Quadratic Zero-Sum Mean-Field Type Games: Optimality Conditions
and Policy Optimization [1.1852406625172216]
zero-sum mean-field type games (ZSMFTG) with linear dynamics and quadratic cost are studied.
Two decision makers whose utilities sum to zero, compete to influence a large population of indistinguishable agents.
optimality conditions of the game are analysed for both open-loop and closed-loop controls.
arXiv Detail & Related papers (2020-09-01T17:08:24Z) - Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field
Control/Game in Continuous Time [109.06623773924737]
We study the policy gradient method for the linear-quadratic mean-field control and game.
We show that it converges to the optimal solution at a linear rate, which is verified by a synthetic simulation.
arXiv Detail & Related papers (2020-08-16T06:34:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.