Inverse Reinforcement Learning With Constraint Recovery
- URL: http://arxiv.org/abs/2305.08130v1
- Date: Sun, 14 May 2023 11:49:37 GMT
- Title: Inverse Reinforcement Learning With Constraint Recovery
- Authors: Nirjhar Das and Arpan Chattopadhyay
- Abstract summary: We propose a novel inverse reinforcement learning (IRL) algorithm for constrained decision process (CMDP) problems.
We demonstrate the efficacy of our algorithm for the grid world environment.
- Score: 3.8073142980732992
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this work, we propose a novel inverse reinforcement learning (IRL)
algorithm for constrained Markov decision process (CMDP) problems. In standard
IRL problems, the inverse learner or agent seeks to recover the reward function
of the MDP, given a set of trajectory demonstrations for the optimal policy. In
this work, we seek to infer not only the reward functions of the CMDP, but also
the constraints. Using the principle of maximum entropy, we show that the IRL
with constraint recovery (IRL-CR) problem can be cast as a constrained
non-convex optimization problem. We reduce it to an alternating constrained
optimization problem whose sub-problems are convex. We use exponentiated
gradient descent algorithm to solve it. Finally, we demonstrate the efficacy of
our algorithm for the grid world environment.
Related papers
- Best-of-Both-Worlds Policy Optimization for CMDPs with Bandit Feedback [34.7178680288326]
Stradi et al.(2024) proposed the first best-of-both-worlds algorithm for constrained Markov decision processes.
In this paper, we provide the first best-of-both-worlds algorithm for CMDPs with bandit feedback.
Our algorithm is based on a policy optimization approach, which is much more efficient than occupancy-measure-based methods.
arXiv Detail & Related papers (2024-10-03T07:44:40Z) - Maximum Causal Entropy Inverse Reinforcement Learning for Mean-Field
Games [3.2228025627337864]
We introduce the casual entropy Inverse Reinforcement (IRL) problem for discrete-time mean-field games (MFGs) under an infinite-horizon discounted-reward optimality criterion.
We present by formulating the MFG problem as a generalized Nash equilibrium problem (GN), which is capable of computing the meanfield equilibrium problem.
This method is employed to produce data for a numerical example.
arXiv Detail & Related papers (2024-01-12T13:22:03Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Cancellation-Free Regret Bounds for Lagrangian Approaches in Constrained
Markov Decision Processes [24.51454563844664]
We propose a novel model-based dual algorithm OptAug-CMDP for finite-horizon CMDPs.
Our algorithm achieves a regret without the need for the cancellation of errors.
arXiv Detail & Related papers (2023-06-12T10:10:57Z) - Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time
Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy.
Many algorithms for IRL have an inherently nested structure.
We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z) - Successive Convex Approximation Based Off-Policy Optimization for
Constrained Reinforcement Learning [12.523496806744946]
We propose a convex approximation based off-policy optimization (SCAOPO) algorithm to solve the general constrained reinforcement learning problem.
In spite of the time-varying state distribution and the bias incurred by the off-policy learning, the SCAOPO with a feasible initial point can still provably converge to a Karush-Kuhn-Tucker point.
arXiv Detail & Related papers (2021-05-26T13:52:39Z) - A Stochastic Composite Augmented Lagrangian Method For Reinforcement
Learning [9.204659134755795]
We consider the linear programming (LP) formulation for deep reinforcement learning.
The augmented Lagrangian method suffers the double-sampling obstacle in solving the LP.
A deep parameterized augment Lagrangian method is proposed.
arXiv Detail & Related papers (2021-05-20T13:08:06Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Efficient Optimistic Exploration in Linear-Quadratic Regulators via
Lagrangian Relaxation [107.06364966905821]
We study the exploration-exploitation dilemma in the linear quadratic regulator (LQR) setting.
Inspired by the extended value iteration algorithm used in optimistic algorithms for finite MDPs, we propose to relax the optimistic optimization of ofulq.
We show that an $epsilon$-optimistic controller can be computed efficiently by solving at most $Obig(log (1/epsilon)big)$ Riccati equations.
arXiv Detail & Related papers (2020-07-13T16:30:47Z) - Approximation Schemes for ReLU Regression [80.33702497406632]
We consider the fundamental problem of ReLU regression.
The goal is to output the best fitting ReLU with respect to square loss given to draws from some unknown distribution.
arXiv Detail & Related papers (2020-05-26T16:26:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.