Maximum Causal Entropy Inverse Reinforcement Learning for Mean-Field
  Games
        - URL: http://arxiv.org/abs/2401.06566v1
- Date: Fri, 12 Jan 2024 13:22:03 GMT
- Title: Maximum Causal Entropy Inverse Reinforcement Learning for Mean-Field
  Games
- Authors: Berkay Anahtarci, Can Deha Kariksiz, Naci Saldi
- Abstract summary: We introduce the casual entropy Inverse Reinforcement (IRL) problem for discrete-time mean-field games (MFGs) under an infinite-horizon discounted-reward optimality criterion.
We present by formulating the MFG problem as a generalized Nash equilibrium problem (GN), which is capable of computing the meanfield equilibrium problem.
This method is employed to produce data for a numerical example.
- Score: 3.2228025627337864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   In this paper, we introduce the maximum casual entropy Inverse Reinforcement
Learning (IRL) problem for discrete-time mean-field games (MFGs) under an
infinite-horizon discounted-reward optimality criterion. The state space of a
typical agent is finite. Our approach begins with a comprehensive review of the
maximum entropy IRL problem concerning deterministic and stochastic Markov
decision processes (MDPs) in both finite and infinite-horizon scenarios.
Subsequently, we formulate the maximum casual entropy IRL problem for MFGs - a
non-convex optimization problem with respect to policies. Leveraging the linear
programming formulation of MDPs, we restructure this IRL problem into a convex
optimization problem and establish a gradient descent algorithm to compute the
optimal solution with a rate of convergence. Finally, we present a new
algorithm by formulating the MFG problem as a generalized Nash equilibrium
problem (GNEP), which is capable of computing the mean-field equilibrium (MFE)
for the forward RL problem. This method is employed to produce data for a
numerical example. We note that this novel algorithm is also applicable to
general MFE computations.
 
      
        Related papers
        - Finite-Sample Convergence Bounds for Trust Region Policy Optimization in   Mean-Field Games [14.104031043622351]
 We introduce a novel algorithm designed to compute approximate Nash equilibria for ergodic Mean-Field Games (MFG) in finite state-action spaces.<n>Under standard assumptions in the MFG literature, we provide a rigorous analysis of MF-TRPO, establishing theoretical guarantees on its convergence.<n>This work advances MFG optimization by bridging RL techniques with mean-field decision-making, offering a theoretically grounded approach to solving complex multi-agent problems.
 arXiv  Detail & Related papers  (2025-05-28T18:50:25Z)
- Solving Nonlinear PDEs with Sparse Radial Basis Function Networks [0.0]
 We propose a novel framework for solving nonlinear PDEs using sparse radial basis function (RBF) networks.<n>This work is motivated by longstanding challenges in traditional RBF collocation methods, along with the limitations of physics-informed neural networks (PINNs) and Gaussian process (GP) approaches.
 arXiv  Detail & Related papers  (2025-05-12T17:12:53Z)
- RL-finetuning LLMs from on- and off-policy data with a single algorithm [53.70731390624718]
 We introduce a novel reinforcement learning algorithm (AGRO) for fine-tuning large-language models.<n>AGRO leverages the concept of generation consistency, which states that the optimal policy satisfies the notion of consistency across any possible generation of the model.<n>We derive algorithms that find optimal solutions via the sample-based policy gradient and provide theoretical guarantees on their convergence.
 arXiv  Detail & Related papers  (2025-03-25T12:52:38Z)
- The Distributionally Robust Optimization Model of Sparse Principal   Component Analysis [7.695578200868269]
 We consider sparse principal component analysis (PCA) under a setting where the underlying probability distribution of the random parameter is uncertain.
This problem is formulated as a distributionally robust optimization (DRO) model based on a constructive approach to capturing uncertainty.
We prove that the inner problem admits a closed-form solution, reformulating the original DRO model into an equivalent minimization problem on the Stiefel manifold.
 arXiv  Detail & Related papers  (2025-03-04T11:00:08Z)
- A Quantum Genetic Algorithm Framework for the MaxCut Problem [49.59986385400411]
 The proposed method introduces a Quantum Genetic Algorithm (QGA) using a Grover-based evolutionary framework and divide-and-conquer principles.<n>On complete graphs, the proposed method consistently achieves the true optimal MaxCut values, outperforming the Semidefinite Programming (SDP) approach.<n>On ErdHos-R'enyi random graphs, the QGA demonstrates competitive performance, achieving median solutions within 92-96% of the SDP results.
 arXiv  Detail & Related papers  (2025-01-02T05:06:16Z)
- Performative Reinforcement Learning with Linear Markov Decision Process [14.75815792682734]
 We study the setting of emphperformative reinforcement learning where the deployed policy affects both the reward and the transition of the underlying Markov decision process.
We generalize the results to emphlinear Markov decision processes which is the primary theoretical model of large-scale MDPs.
 arXiv  Detail & Related papers  (2024-11-07T23:04:48Z)
- From Inverse Optimization to Feasibility to ERM [11.731853838892487]
 We study the contextual inverse setting that utilizes additional contextual information to better predict parameters.
We experimentally validate our approach on synthetic and real-world problems and demonstrate improved performance compared to existing methods.
 arXiv  Detail & Related papers  (2024-02-27T21:06:42Z)
- Double Duality: Variational Primal-Dual Policy Optimization for
  Constrained Reinforcement Learning [132.7040981721302]
 We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure.
Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
 arXiv  Detail & Related papers  (2024-02-16T16:35:18Z)
- Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
  of Policy-Gradient Methods [52.0617030129699]
 We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
 arXiv  Detail & Related papers  (2023-10-08T23:39:38Z)
- Deep Reinforcement Learning for Infinite Horizon Mean Field Problems in   Continuous Spaces [1.4999444543328293]
 We present a reinforcement learning (RL) algorithm designed to solve mean field games (MFG) and mean field control (MFC) problems in a unified manner.<n>The proposed approach pairs the actor-critic (AC) paradigm with a representation of the mean field distribution via a parameterized score function.<n>A modification of the algorithm allows us to solve mixed mean field control games (MFCGs)
 arXiv  Detail & Related papers  (2023-09-19T22:37:47Z)
- PARL: A Unified Framework for Policy Alignment in Reinforcement Learning   from Human Feedback [106.63518036538163]
 We present a novel unified bilevel optimization-based framework, textsfPARL, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning.
Our framework addressed these concerns by explicitly parameterizing the distribution of the upper alignment objective (reward design) by the lower optimal variable.
Our empirical results substantiate that the proposed textsfPARL can address the alignment concerns in RL by showing significant improvements.
 arXiv  Detail & Related papers  (2023-08-03T18:03:44Z)
- Constrained Optimization via Exact Augmented Lagrangian and Randomized
  Iterative Sketching [55.28394191394675]
 We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
 arXiv  Detail & Related papers  (2023-05-28T06:33:37Z)
- Inverse Reinforcement Learning With Constraint Recovery [3.8073142980732992]
 We propose a novel inverse reinforcement learning (IRL) algorithm for constrained decision process (CMDP) problems.
We demonstrate the efficacy of our algorithm for the grid world environment.
 arXiv  Detail & Related papers  (2023-05-14T11:49:37Z)
- An Asymptotically Optimal Algorithm for the Convex Hull Membership   Problem [21.312152185262]
 We study the convex hull membership problem in the pure exploration setting.
We present the firstally optimal algorithm called Thompson-CHM, whose modular design consists of a stopping rule and a sampling rule.
 arXiv  Detail & Related papers  (2023-02-03T23:41:53Z)
- Stochastic Mirror Descent for Large-Scale Sparse Recovery [13.500750042707407]
 We discuss an application of quadratic Approximation to statistical estimation of high-dimensional sparse parameters.
We show that the proposed algorithm attains the optimal convergence of the estimation error under weak assumptions on the regressor distribution.
 arXiv  Detail & Related papers  (2022-10-23T23:23:23Z)
- Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
 We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs)
Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals.
We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
 arXiv  Detail & Related papers  (2021-02-27T19:28:39Z)
- Softmax Policy Gradient Methods Can Take Exponential Time to Converge [60.98700344526674]
 The softmax policy gradient (PG) method is arguably one of the de facto implementations of policy optimization in modern reinforcement learning.
We demonstrate that softmax PG methods can take exponential time -- in terms of $mathcalS|$ and $frac11-gamma$ -- to converge.
 arXiv  Detail & Related papers  (2021-02-22T18:56:26Z)
- Gaussian Process-based Min-norm Stabilizing Controller for
  Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
 We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
 arXiv  Detail & Related papers  (2020-11-14T01:27:32Z)
- Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
  Distribution Uncertainty Estimation [99.92568326314667]
 We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
 arXiv  Detail & Related papers  (2020-11-05T08:04:34Z)
- Reinforcement Learning in Non-Stationary Discrete-Time Linear-Quadratic
  Mean-Field Games [14.209473797379667]
 We study large population multi-agent reinforcement learning (RL) in the context of discrete-time linear-quadratic mean-field games (LQ-MFGs)
Our setting differs from most existing work on RL for MFGs, in that we consider a non-stationary MFG over an infinite horizon.
We propose an actor-critic algorithm to iteratively compute the mean-field equilibrium (MFE) of the LQ-MFG.
 arXiv  Detail & Related papers  (2020-09-09T15:17:52Z)
- A Dynamical Systems Approach for Convergence of the Bayesian EM
  Algorithm [59.99439951055238]
 We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based.
The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM)
We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
 arXiv  Detail & Related papers  (2020-06-23T01:34:18Z)
- Constrained Combinatorial Optimization with Reinforcement Learning [0.30938904602244344]
 This paper presents a framework to tackle constrained optimization problems using deep Reinforcement Learning (RL)
We extend the Neural Combinatorial Optimization (NCO) theory in order to deal with constraints in its formulation.
In that context, the solution is iteratively constructed based on interactions with the environment.
 arXiv  Detail & Related papers  (2020-06-22T03:13:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.