Linear-Quadratic Zero-Sum Mean-Field Type Games: Optimality Conditions
and Policy Optimization
- URL: http://arxiv.org/abs/2009.00578v1
- Date: Tue, 1 Sep 2020 17:08:24 GMT
- Title: Linear-Quadratic Zero-Sum Mean-Field Type Games: Optimality Conditions
and Policy Optimization
- Authors: Ren\'e Carmona and Kenza Hamidouche and Mathieu Lauri\`ere and Zongjun
Tan
- Abstract summary: zero-sum mean-field type games (ZSMFTG) with linear dynamics and quadratic cost are studied.
Two decision makers whose utilities sum to zero, compete to influence a large population of indistinguishable agents.
optimality conditions of the game are analysed for both open-loop and closed-loop controls.
- Score: 1.1852406625172216
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, zero-sum mean-field type games (ZSMFTG) with linear dynamics
and quadratic cost are studied under infinite-horizon discounted utility
function. ZSMFTG are a class of games in which two decision makers whose
utilities sum to zero, compete to influence a large population of
indistinguishable agents. In particular, the case in which the transition and
utility functions depend on the state, the action of the controllers, and the
mean of the state and the actions, is investigated. The optimality conditions
of the game are analysed for both open-loop and closed-loop controls, and
explicit expressions for the Nash equilibrium strategies are derived. Moreover,
two policy optimization methods that rely on policy gradient are proposed for
both model-based and sample-based frameworks. In the model-based case, the
gradients are computed exactly using the model, whereas they are estimated
using Monte-Carlo simulations in the sample-based case. Numerical experiments
are conducted to show the convergence of the utility function as well as the
two players' controls.
Related papers
- Stable Inverse Reinforcement Learning: Policies from Control Lyapunov Landscapes [4.229902091180109]
We propose a novel, stability-certified IRL approach to learning control Lyapunov functions from demonstrations data.
By exploiting closed-form expressions for associated control policies, we are able to efficiently search the space of CLFs.
We present a theoretical analysis of the optimality properties provided by the CLF and evaluate our approach using both simulated and real-world data.
arXiv Detail & Related papers (2024-05-14T16:40:45Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Double Duality: Variational Primal-Dual Policy Optimization for
Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure.
Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z) - Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games [66.2085181793014]
We show that a model-free stage-based Q-learning algorithm can enjoy the same optimality in the $H$ dependence as model-based algorithms.
Our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions.
arXiv Detail & Related papers (2023-08-17T08:34:58Z) - $K$-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic
Control [0.6906005491572401]
We propose a novel $K$-nearest neighbor reparametric procedure for estimating the performance of a policy from historical data.
Our analysis allows for the sampling of entire episodes, as is common practice in most applications.
Compared to other OPE methods, our algorithm does not require optimization, can be efficiently implemented via tree-based nearest neighbor search and parallelization, and does not explicitly assume a parametric model for the environment's dynamics.
arXiv Detail & Related papers (2023-06-07T23:55:12Z) - Adaptive LASSO estimation for functional hidden dynamic geostatistical
model [69.10717733870575]
We propose a novel model selection algorithm based on a penalized maximum likelihood estimator (PMLE) for functional hiddenstatistical models (f-HD)
The algorithm is based on iterative optimisation and uses an adaptive least absolute shrinkage and selector operator (GMSOLAS) penalty function, wherein the weights are obtained by the unpenalised f-HD maximum-likelihood estimators.
arXiv Detail & Related papers (2022-08-10T19:17:45Z) - Stochastic optimal well control in subsurface reservoirs using
reinforcement learning [0.0]
We present a case study of model-free reinforcement learning framework to solve optimal control for a predefined parameter uncertainty distribution.
In principle, RL algorithms are capable of learning optimal action policies to maximize a numerical reward signal.
We present numerical results using two state-of-the-art RL algorithms, proximal policy optimization (PPO) and advantage actor-critic (A2C) on two subsurface flow test cases.
arXiv Detail & Related papers (2022-07-07T17:34:23Z) - Optimal control of robust team stochastic games [5.425935258756356]
We propose a model of "robust" team games, where players utilize a robust optimization approach to make decisions.
We develop a learning algorithm in the form of a Gauss-Seidel modified policy iteration and prove its convergence.
Some numerical simulations are presented to demonstrate the effectiveness of the algorithm.
arXiv Detail & Related papers (2021-05-16T10:42:09Z) - Provably Efficient Policy Gradient Methods for Two-Player Zero-Sum
Markov Games [95.70078702838654]
This paper studies natural extensions of Natural Policy Gradient algorithm for solving two-player zero-sum games.
We thoroughly characterize the algorithms' performance in terms of the number of samples, number of iterations, concentrability coefficients, and approximation error.
arXiv Detail & Related papers (2021-02-17T17:49:57Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - Policy Optimization for Linear-Quadratic Zero-Sum Mean-Field Type Games [1.1852406625172216]
zero-sum mean-field type games (ZSMFTG) with linear dynamics and quadratic utility are studied.
Two policy optimization methods that rely on policy gradient are proposed.
arXiv Detail & Related papers (2020-09-02T13:49:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.