Maximum Causal Entropy IRL in Mean-Field Games and GNEP Framework for Forward RL
- URL: http://arxiv.org/abs/2401.06566v2
- Date: Sat, 19 Jul 2025 09:43:59 GMT
- Title: Maximum Causal Entropy IRL in Mean-Field Games and GNEP Framework for Forward RL
- Authors: Berkay Anahtarci, Can Deha Kariksiz, Naci Saldi,
- Abstract summary: This paper explores the use of Causal Entropy In Reinforcement Learning (IRL) within the context of discrete-time reinforcement-Field Games (MFFGs)<n>MFFGs generate data for a non-action state gradient as a Generalized Nash Problem (GNEP)
- Score: 2.867517731896504
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper explores the use of Maximum Causal Entropy Inverse Reinforcement Learning (IRL) within the context of discrete-time stationary Mean-Field Games (MFGs) characterized by finite state spaces and an infinite-horizon, discounted-reward setting. Although the resulting optimization problem is non-convex with respect to policies, we reformulate it as a convex optimization problem in terms of state-action occupation measures by leveraging the linear programming framework of Markov Decision Processes. Based on this convex reformulation, we introduce a gradient descent algorithm with a guaranteed convergence rate to efficiently compute the optimal solution. Moreover, we develop a new method that conceptualizes the MFG problem as a Generalized Nash Equilibrium Problem (GNEP), enabling effective computation of the mean-field equilibrium for forward reinforcement learning (RL) problems and marking an advancement in MFG solution techniques. We further illustrate the practical applicability of our GNEP approach by employing this algorithm to generate data for numerical MFG examples.
Related papers
- Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games [14.104031043622351]
We introduce a novel algorithm designed to compute approximate Nash equilibria for ergodic Mean-Field Games (MFG) in finite state-action spaces.<n>Under standard assumptions in the MFG literature, we provide a rigorous analysis of MF-TRPO, establishing theoretical guarantees on its convergence.<n>This work advances MFG optimization by bridging RL techniques with mean-field decision-making, offering a theoretically grounded approach to solving complex multi-agent problems.
arXiv Detail & Related papers (2025-05-28T18:50:25Z) - Solving Nonlinear PDEs with Sparse Radial Basis Function Networks [0.0]
We propose a novel framework for solving nonlinear PDEs using sparse radial basis function (RBF) networks.<n>This work is motivated by longstanding challenges in traditional RBF collocation methods, along with the limitations of physics-informed neural networks (PINNs) and Gaussian process (GP) approaches.
arXiv Detail & Related papers (2025-05-12T17:12:53Z) - RL-finetuning LLMs from on- and off-policy data with a single algorithm [53.70731390624718]
We introduce a novel reinforcement learning algorithm (AGRO) for fine-tuning large-language models.<n>AGRO leverages the concept of generation consistency, which states that the optimal policy satisfies the notion of consistency across any possible generation of the model.<n>We derive algorithms that find optimal solutions via the sample-based policy gradient and provide theoretical guarantees on their convergence.
arXiv Detail & Related papers (2025-03-25T12:52:38Z) - The Distributionally Robust Optimization Model of Sparse Principal Component Analysis [7.695578200868269]
We consider sparse principal component analysis (PCA) under a setting where the underlying probability distribution of the random parameter is uncertain.
This problem is formulated as a distributionally robust optimization (DRO) model based on a constructive approach to capturing uncertainty.
We prove that the inner problem admits a closed-form solution, reformulating the original DRO model into an equivalent minimization problem on the Stiefel manifold.
arXiv Detail & Related papers (2025-03-04T11:00:08Z) - A Quantum Genetic Algorithm Framework for the MaxCut Problem [49.59986385400411]
The proposed method introduces a Quantum Genetic Algorithm (QGA) using a Grover-based evolutionary framework and divide-and-conquer principles.<n>On complete graphs, the proposed method consistently achieves the true optimal MaxCut values, outperforming the Semidefinite Programming (SDP) approach.<n>On ErdHos-R'enyi random graphs, the QGA demonstrates competitive performance, achieving median solutions within 92-96% of the SDP results.
arXiv Detail & Related papers (2025-01-02T05:06:16Z) - Performative Reinforcement Learning with Linear Markov Decision Process [14.75815792682734]
We study the setting of emphperformative reinforcement learning where the deployed policy affects both the reward and the transition of the underlying Markov decision process.
We generalize the results to emphlinear Markov decision processes which is the primary theoretical model of large-scale MDPs.
arXiv Detail & Related papers (2024-11-07T23:04:48Z) - From Inverse Optimization to Feasibility to ERM [11.731853838892487]
We study the contextual inverse setting that utilizes additional contextual information to better predict parameters.
We experimentally validate our approach on synthetic and real-world problems and demonstrate improved performance compared to existing methods.
arXiv Detail & Related papers (2024-02-27T21:06:42Z) - Double Duality: Variational Primal-Dual Policy Optimization for
Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure.
Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Deep Reinforcement Learning for Infinite Horizon Mean Field Problems in Continuous Spaces [1.4999444543328293]
We present a reinforcement learning (RL) algorithm designed to solve mean field games (MFG) and mean field control (MFC) problems in a unified manner.<n>The proposed approach pairs the actor-critic (AC) paradigm with a representation of the mean field distribution via a parameterized score function.<n>A modification of the algorithm allows us to solve mixed mean field control games (MFCGs)
arXiv Detail & Related papers (2023-09-19T22:37:47Z) - PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback [106.63518036538163]
We present a novel unified bilevel optimization-based framework, textsfPARL, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning.
Our framework addressed these concerns by explicitly parameterizing the distribution of the upper alignment objective (reward design) by the lower optimal variable.
Our empirical results substantiate that the proposed textsfPARL can address the alignment concerns in RL by showing significant improvements.
arXiv Detail & Related papers (2023-08-03T18:03:44Z) - Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - Inverse Reinforcement Learning With Constraint Recovery [3.8073142980732992]
We propose a novel inverse reinforcement learning (IRL) algorithm for constrained decision process (CMDP) problems.
We demonstrate the efficacy of our algorithm for the grid world environment.
arXiv Detail & Related papers (2023-05-14T11:49:37Z) - An Asymptotically Optimal Algorithm for the Convex Hull Membership Problem [21.312152185262]
We study the convex hull membership problem in the pure exploration setting.
We present the firstally optimal algorithm called Thompson-CHM, whose modular design consists of a stopping rule and a sampling rule.
arXiv Detail & Related papers (2023-02-03T23:41:53Z) - Stochastic Mirror Descent for Large-Scale Sparse Recovery [13.500750042707407]
We discuss an application of quadratic Approximation to statistical estimation of high-dimensional sparse parameters.
We show that the proposed algorithm attains the optimal convergence of the estimation error under weak assumptions on the regressor distribution.
arXiv Detail & Related papers (2022-10-23T23:23:23Z) - Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs)
Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals.
We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z) - Softmax Policy Gradient Methods Can Take Exponential Time to Converge [60.98700344526674]
The softmax policy gradient (PG) method is arguably one of the de facto implementations of policy optimization in modern reinforcement learning.
We demonstrate that softmax PG methods can take exponential time -- in terms of $mathcalS|$ and $frac11-gamma$ -- to converge.
arXiv Detail & Related papers (2021-02-22T18:56:26Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Reinforcement Learning in Non-Stationary Discrete-Time Linear-Quadratic
Mean-Field Games [14.209473797379667]
We study large population multi-agent reinforcement learning (RL) in the context of discrete-time linear-quadratic mean-field games (LQ-MFGs)
Our setting differs from most existing work on RL for MFGs, in that we consider a non-stationary MFG over an infinite horizon.
We propose an actor-critic algorithm to iteratively compute the mean-field equilibrium (MFE) of the LQ-MFG.
arXiv Detail & Related papers (2020-09-09T15:17:52Z) - A Dynamical Systems Approach for Convergence of the Bayesian EM
Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based.
The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM)
We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z) - Constrained Combinatorial Optimization with Reinforcement Learning [0.30938904602244344]
This paper presents a framework to tackle constrained optimization problems using deep Reinforcement Learning (RL)
We extend the Neural Combinatorial Optimization (NCO) theory in order to deal with constraints in its formulation.
In that context, the solution is iteratively constructed based on interactions with the environment.
arXiv Detail & Related papers (2020-06-22T03:13:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.