Related papers: Reinforcement Learning in Non-Stationary Discrete-Time Linear-Quadratic Mean-Field Games

Reinforcement Learning in Non-Stationary Discrete-Time Linear-Quadratic Mean-Field Games

URL: http://arxiv.org/abs/2009.04350v3
Date: Thu, 1 Oct 2020 15:41:33 GMT
Title: Reinforcement Learning in Non-Stationary Discrete-Time Linear-Quadratic Mean-Field Games
Authors: Muhammad Aneeq uz Zaman, Kaiqing Zhang, Erik Miehling, and Tamer Ba\c{s}ar
Abstract summary: We study large population multi-agent reinforcement learning (RL) in the context of discrete-time linear-quadratic mean-field games (LQ-MFGs) Our setting differs from most existing work on RL for MFGs, in that we consider a non-stationary MFG over an infinite horizon. We propose an actor-critic algorithm to iteratively compute the mean-field equilibrium (MFE) of the LQ-MFG.
Score: 14.209473797379667
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we study large population multi-agent reinforcement learning (RL) in the context of discrete-time linear-quadratic mean-field games (LQ-MFGs). Our setting differs from most existing work on RL for MFGs, in that we consider a non-stationary MFG over an infinite horizon. We propose an actor-critic algorithm to iteratively compute the mean-field equilibrium (MFE) of the LQ-MFG. There are two primary challenges: i) the non-stationarity of the MFG induces a linear-quadratic tracking problem, which requires solving a backwards-in-time (non-causal) equation that cannot be solved by standard (causal) RL algorithms; ii) Many RL algorithms assume that the states are sampled from the stationary distribution of a Markov chain (MC), that is, the chain is already mixed, an assumption that is not satisfied for real data sources. We first identify that the mean-field trajectory follows linear dynamics, allowing the problem to be reformulated as a linear quadratic Gaussian problem. Under this reformulation, we propose an actor-critic algorithm that allows samples to be drawn from an unmixed MC. Finite-sample convergence guarantees for the algorithm are then provided. To characterize the performance of our algorithm in multi-agent RL, we have developed an error bound with respect to the Nash equilibrium of the finite-population game.

Related papers

The Power of Perturbation under Sampling in Solving Extensive-Form Games [56.013335390600524]
This paper investigates how perturbation does and does not improve the Follow-the-Regularized-Leader (FTRL) algorithm in imperfect-information extensive-form games. Perturbing the expected payoffs guarantees that the FTRL dynamics reach an approximate equilibrium. We show that in the last-iterate sense, the FTRL consistently outperforms the non-samplinged FTRL.
arXiv Detail & Related papers (2025-01-28T00:29:38Z)
Discrete Probabilistic Inference as Control in Multi-path Environments [84.67055173040107]
We consider the problem of sampling from a discrete and structured distribution as a sequential decision problem. We show that GFlowNets learn a policy that samples objects proportionally to their reward by enforcing a conservation of flows. We also prove that some flow-matching objectives found in the GFlowNet literature are in fact equivalent to well-established MaxEnt RL algorithms with a corrected reward.
arXiv Detail & Related papers (2024-02-15T20:20:35Z)
Maximum Causal Entropy Inverse Reinforcement Learning for Mean-Field Games [3.2228025627337864]
We introduce the casual entropy Inverse Reinforcement (IRL) problem for discrete-time mean-field games (MFGs) under an infinite-horizon discounted-reward optimality criterion. We present by formulating the MFG problem as a generalized Nash equilibrium problem (GN), which is capable of computing the meanfield equilibrium problem. This method is employed to produce data for a numerical example.
arXiv Detail & Related papers (2024-01-12T13:22:03Z)
Global Convergence of Online Identification for Mixed Linear Regression [1.9295130374196499]
Mixed linear regression (MLR) is a powerful model for characterizing nonlinear relationships. This paper investigates the online identification and data clustering problems for two basic classes of MLRs. It introduces two corresponding new online identification algorithms based on the expectation-maximization principle.
arXiv Detail & Related papers (2023-11-30T12:30:42Z)
Deep Reinforcement Learning for Infinite Horizon Mean Field Problems in Continuous Spaces [1.4999444543328293]
We present a reinforcement learning (RL) algorithm designed to solve mean field games (MFG) and mean field control (MFC) problems in a unified manner. The proposed approach pairs the actor-critic (AC) paradigm with a representation of the mean field distribution via a parameterized score function. A modification of the algorithm allows us to solve mixed mean field control games (MFCGs)
arXiv Detail & Related papers (2023-09-19T22:37:47Z)
Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems. We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z)
PAPAL: A Provable PArticle-based Primal-Dual ALgorithm for Mixed Nash Equilibrium [58.26573117273626]
We consider the non-AL equilibrium nonconptotic objective function in two-player zero-sum continuous games. Our novel insights into the particle-based algorithms for continuous distribution strategies are presented.
arXiv Detail & Related papers (2023-03-02T05:08:15Z)
Scalable Deep Reinforcement Learning Algorithms for Mean Field Games [60.550128966505625]
Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. Existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values. We propose two methods to address this shortcoming. The first one learns a mixed strategy from distillation of historical data into a neural network and is applied to the Fictitious Play algorithm. The second one is an online mixing method based on
arXiv Detail & Related papers (2022-03-22T18:10:32Z)
Covariance-Free Sparse Bayesian Learning [62.24008859844098]
We introduce a new SBL inference algorithm that avoids explicit inversions of the covariance matrix. Our method can be up to thousands of times faster than existing baselines. We showcase how our new algorithm enables SBL to tractably tackle high-dimensional signal recovery problems.
arXiv Detail & Related papers (2021-05-21T16:20:07Z)
A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants [39.28675942566465]
This paper develops a framework to study finite-sample convergence guarantees of a class of value-based asynchronous RL algorithms. As a by-product, we provide theoretical insights into the bias-variance trade-off, i.e., efficiency of bootstrapping in RL.
arXiv Detail & Related papers (2021-02-02T15:48:19Z)
A spectral algorithm for robust regression with subgaussian rates [0.0]
We study a new linear up to quadratic time algorithm for linear regression in the absence of strong assumptions on the underlying distributions of samples. The goal is to design a procedure which attains the optimal sub-gaussian error bound even though the data have only finite moments.
arXiv Detail & Related papers (2020-07-12T19:33:50Z)
Unified Reinforcement Q-Learning for Mean Field Game and Control Problems [0.0]
We present a Reinforcement Learning (RL) algorithm to solve infinite horizon Mean Field Game (MFG) and Mean Field Control (MFC) problems. Our approach can be described as a unified two-timescale Mean Field Q-learning: The emphsame algorithm can learn either the MFG or the MFC solution by simply tuning the ratio of two learning parameters.
arXiv Detail & Related papers (2020-06-24T17:45:44Z)
Sparsified Linear Programming for Zero-Sum Equilibrium Finding [89.30539368124025]
We present a totally different approach to the problem, which is competitive and often orders of magnitude better than the prior state of the art. With experiments on poker endgames, we demonstrate, for the first time, that modern linear program solvers are competitive against even game-specific modern variants of CFR.
arXiv Detail & Related papers (2020-06-05T13:48:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.