Maximum Causal Entropy Inverse Reinforcement Learning for Mean-Field
Games
- URL: http://arxiv.org/abs/2401.06566v1
- Date: Fri, 12 Jan 2024 13:22:03 GMT
- Title: Maximum Causal Entropy Inverse Reinforcement Learning for Mean-Field
Games
- Authors: Berkay Anahtarci, Can Deha Kariksiz, Naci Saldi
- Abstract summary: We introduce the casual entropy Inverse Reinforcement (IRL) problem for discrete-time mean-field games (MFGs) under an infinite-horizon discounted-reward optimality criterion.
We present by formulating the MFG problem as a generalized Nash equilibrium problem (GN), which is capable of computing the meanfield equilibrium problem.
This method is employed to produce data for a numerical example.
- Score: 3.2228025627337864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce the maximum casual entropy Inverse Reinforcement
Learning (IRL) problem for discrete-time mean-field games (MFGs) under an
infinite-horizon discounted-reward optimality criterion. The state space of a
typical agent is finite. Our approach begins with a comprehensive review of the
maximum entropy IRL problem concerning deterministic and stochastic Markov
decision processes (MDPs) in both finite and infinite-horizon scenarios.
Subsequently, we formulate the maximum casual entropy IRL problem for MFGs - a
non-convex optimization problem with respect to policies. Leveraging the linear
programming formulation of MDPs, we restructure this IRL problem into a convex
optimization problem and establish a gradient descent algorithm to compute the
optimal solution with a rate of convergence. Finally, we present a new
algorithm by formulating the MFG problem as a generalized Nash equilibrium
problem (GNEP), which is capable of computing the mean-field equilibrium (MFE)
for the forward RL problem. This method is employed to produce data for a
numerical example. We note that this novel algorithm is also applicable to
general MFE computations.
Related papers
- Double Duality: Variational Primal-Dual Policy Optimization for
Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure.
Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - Inverse Reinforcement Learning With Constraint Recovery [3.8073142980732992]
We propose a novel inverse reinforcement learning (IRL) algorithm for constrained decision process (CMDP) problems.
We demonstrate the efficacy of our algorithm for the grid world environment.
arXiv Detail & Related papers (2023-05-14T11:49:37Z) - An Asymptotically Optimal Algorithm for the Convex Hull Membership Problem [21.312152185262]
We study the convex hull membership problem in the pure exploration setting.
We present the firstally optimal algorithm called Thompson-CHM, whose modular design consists of a stopping rule and a sampling rule.
arXiv Detail & Related papers (2023-02-03T23:41:53Z) - Stochastic Mirror Descent for Large-Scale Sparse Recovery [13.500750042707407]
We discuss an application of quadratic Approximation to statistical estimation of high-dimensional sparse parameters.
We show that the proposed algorithm attains the optimal convergence of the estimation error under weak assumptions on the regressor distribution.
arXiv Detail & Related papers (2022-10-23T23:23:23Z) - Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games [16.09467599829253]
We study the problem of finding Nash equilibrium in a two-player zero-sum game.
Our main contribution is to show that under proper choices of the regularization parameter, the gradient descents to the Nash equilibrium of the original unregularized problem.
arXiv Detail & Related papers (2022-05-27T03:24:12Z) - Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs)
Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals.
We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z) - Softmax Policy Gradient Methods Can Take Exponential Time to Converge [60.98700344526674]
The softmax policy gradient (PG) method is arguably one of the de facto implementations of policy optimization in modern reinforcement learning.
We demonstrate that softmax PG methods can take exponential time -- in terms of $mathcalS|$ and $frac11-gamma$ -- to converge.
arXiv Detail & Related papers (2021-02-22T18:56:26Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Reinforcement Learning in Non-Stationary Discrete-Time Linear-Quadratic
Mean-Field Games [14.209473797379667]
We study large population multi-agent reinforcement learning (RL) in the context of discrete-time linear-quadratic mean-field games (LQ-MFGs)
Our setting differs from most existing work on RL for MFGs, in that we consider a non-stationary MFG over an infinite horizon.
We propose an actor-critic algorithm to iteratively compute the mean-field equilibrium (MFE) of the LQ-MFG.
arXiv Detail & Related papers (2020-09-09T15:17:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.