Reinforcement Learning in Non-Stationary Discrete-Time Linear-Quadratic
Mean-Field Games
- URL: http://arxiv.org/abs/2009.04350v3
- Date: Thu, 1 Oct 2020 15:41:33 GMT
- Title: Reinforcement Learning in Non-Stationary Discrete-Time Linear-Quadratic
Mean-Field Games
- Authors: Muhammad Aneeq uz Zaman, Kaiqing Zhang, Erik Miehling, and Tamer
Ba\c{s}ar
- Abstract summary: We study large population multi-agent reinforcement learning (RL) in the context of discrete-time linear-quadratic mean-field games (LQ-MFGs)
Our setting differs from most existing work on RL for MFGs, in that we consider a non-stationary MFG over an infinite horizon.
We propose an actor-critic algorithm to iteratively compute the mean-field equilibrium (MFE) of the LQ-MFG.
- Score: 14.209473797379667
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study large population multi-agent reinforcement learning
(RL) in the context of discrete-time linear-quadratic mean-field games
(LQ-MFGs). Our setting differs from most existing work on RL for MFGs, in that
we consider a non-stationary MFG over an infinite horizon. We propose an
actor-critic algorithm to iteratively compute the mean-field equilibrium (MFE)
of the LQ-MFG. There are two primary challenges: i) the non-stationarity of the
MFG induces a linear-quadratic tracking problem, which requires solving a
backwards-in-time (non-causal) equation that cannot be solved by standard
(causal) RL algorithms; ii) Many RL algorithms assume that the states are
sampled from the stationary distribution of a Markov chain (MC), that is, the
chain is already mixed, an assumption that is not satisfied for real data
sources. We first identify that the mean-field trajectory follows linear
dynamics, allowing the problem to be reformulated as a linear quadratic
Gaussian problem. Under this reformulation, we propose an actor-critic
algorithm that allows samples to be drawn from an unmixed MC. Finite-sample
convergence guarantees for the algorithm are then provided. To characterize the
performance of our algorithm in multi-agent RL, we have developed an error
bound with respect to the Nash equilibrium of the finite-population game.
Related papers
- Discrete Probabilistic Inference as Control in Multi-path Environments [84.67055173040107]
We consider the problem of sampling from a discrete and structured distribution as a sequential decision problem.
We show that GFlowNets learn a policy that samples objects proportionally to their reward by enforcing a conservation of flows.
We also prove that some flow-matching objectives found in the GFlowNet literature are in fact equivalent to well-established MaxEnt RL algorithms with a corrected reward.
arXiv Detail & Related papers (2024-02-15T20:20:35Z) - Maximum Causal Entropy Inverse Reinforcement Learning for Mean-Field
Games [3.2228025627337864]
We introduce the casual entropy Inverse Reinforcement (IRL) problem for discrete-time mean-field games (MFGs) under an infinite-horizon discounted-reward optimality criterion.
We present by formulating the MFG problem as a generalized Nash equilibrium problem (GN), which is capable of computing the meanfield equilibrium problem.
This method is employed to produce data for a numerical example.
arXiv Detail & Related papers (2024-01-12T13:22:03Z) - Global Convergence of Online Identification for Mixed Linear Regression [1.9295130374196499]
Mixed linear regression (MLR) is a powerful model for characterizing nonlinear relationships.
This paper investigates the online identification and data clustering problems for two basic classes of MLRs.
It introduces two corresponding new online identification algorithms based on the expectation-maximization principle.
arXiv Detail & Related papers (2023-11-30T12:30:42Z) - Deep Reinforcement Learning for Infinite Horizon Mean Field Problems in Continuous Spaces [1.4999444543328293]
We present a reinforcement learning (RL) algorithm designed to solve mean field games (MFG) and mean field control (MFC) problems in a unified manner.
The proposed approach pairs the actor-critic (AC) paradigm with a representation of the mean field distribution via a parameterized score function.
A modification of the algorithm allows us to solve mixed mean field control games (MFCGs)
arXiv Detail & Related papers (2023-09-19T22:37:47Z) - Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - PAPAL: A Provable PArticle-based Primal-Dual ALgorithm for Mixed Nash
Equilibrium [62.51015395213579]
We consider the non-AL equilibrium nonconptotic objective function in two-player zero-sum continuous games.
The proposed algorithm employs the movements of particles to represent the updates of random strategies for the $ilon$-mixed Nash equilibrium.
arXiv Detail & Related papers (2023-03-02T05:08:15Z) - Scalable Deep Reinforcement Learning Algorithms for Mean Field Games [60.550128966505625]
Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents.
Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods.
Existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values.
We propose two methods to address this shortcoming. The first one learns a mixed strategy from distillation of historical data into a neural network and is applied to the Fictitious Play algorithm.
The second one is an online mixing method based on
arXiv Detail & Related papers (2022-03-22T18:10:32Z) - Covariance-Free Sparse Bayesian Learning [62.24008859844098]
We introduce a new SBL inference algorithm that avoids explicit inversions of the covariance matrix.
Our method can be up to thousands of times faster than existing baselines.
We showcase how our new algorithm enables SBL to tractably tackle high-dimensional signal recovery problems.
arXiv Detail & Related papers (2021-05-21T16:20:07Z) - A spectral algorithm for robust regression with subgaussian rates [0.0]
We study a new linear up to quadratic time algorithm for linear regression in the absence of strong assumptions on the underlying distributions of samples.
The goal is to design a procedure which attains the optimal sub-gaussian error bound even though the data have only finite moments.
arXiv Detail & Related papers (2020-07-12T19:33:50Z) - Unified Reinforcement Q-Learning for Mean Field Game and Control
Problems [0.0]
We present a Reinforcement Learning (RL) algorithm to solve infinite horizon Mean Field Game (MFG) and Mean Field Control (MFC) problems.
Our approach can be described as a unified two-timescale Mean Field Q-learning: The emphsame algorithm can learn either the MFG or the MFC solution by simply tuning the ratio of two learning parameters.
arXiv Detail & Related papers (2020-06-24T17:45:44Z) - Sparsified Linear Programming for Zero-Sum Equilibrium Finding [89.30539368124025]
We present a totally different approach to the problem, which is competitive and often orders of magnitude better than the prior state of the art.
With experiments on poker endgames, we demonstrate, for the first time, that modern linear program solvers are competitive against even game-specific modern variants of CFR.
arXiv Detail & Related papers (2020-06-05T13:48:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.