Identifiability and Generalizability in Constrained Inverse
Reinforcement Learning
- URL: http://arxiv.org/abs/2306.00629v1
- Date: Thu, 1 Jun 2023 12:52:34 GMT
- Title: Identifiability and Generalizability in Constrained Inverse
Reinforcement Learning
- Authors: Andreas Schlaginhaufen, Maryam Kamgarpour
- Abstract summary: Two main challenges in Reinforcement Learning are designing appropriate reward functions and ensuring the safety of the learned policy.
We present a theoretical framework for Inverse Reinforcement Learning (IRL) in constrained Markov decision processes.
We derive a finite sample guarantee for the suboptimality of the learned rewards, and validate our results in a gridworld environment.
- Score: 12.107259467873094
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Two main challenges in Reinforcement Learning (RL) are designing appropriate
reward functions and ensuring the safety of the learned policy. To address
these challenges, we present a theoretical framework for Inverse Reinforcement
Learning (IRL) in constrained Markov decision processes. From a convex-analytic
perspective, we extend prior results on reward identifiability and
generalizability to both the constrained setting and a more general class of
regularizations. In particular, we show that identifiability up to potential
shaping (Cao et al., 2021) is a consequence of entropy regularization and may
generally no longer hold for other regularizations or in the presence of safety
constraints. We also show that to ensure generalizability to new transition
laws and constraints, the true reward must be identified up to a constant.
Additionally, we derive a finite sample guarantee for the suboptimality of the
learned rewards, and validate our results in a gridworld environment.
Related papers
- Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Feasibility Consistent Representation Learning for Safe Reinforcement Learning [25.258227763316228]
We introduce a novel framework named Feasibility Consistent Safe Reinforcement Learning (FCSRL)
This framework combines representation learning with feasibility-oriented objectives to identify and extract safety-related information from the raw state for safe RL.
Our method is capable of learning a better safety-aware embedding and achieving superior performance than previous representation learning baselines.
arXiv Detail & Related papers (2024-05-20T01:37:21Z) - Towards Interpretable Reinforcement Learning with Constrained Normalizing Flow Policies [5.6872893893453105]
Reinforcement learning policies are typically represented by black-box neural networks.
We propose constrained normalizing flow policies as interpretable and safe-by-construction policy models.
arXiv Detail & Related papers (2024-05-02T11:40:15Z) - Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - A Survey of Constraint Formulations in Safe Reinforcement Learning [15.593999581562203]
Safety is critical when applying reinforcement learning to real-world problems.
A prevalent safe RL approach is based on a constrained criterion, which seeks to maximize the expected cumulative reward.
Despite recent effort to enhance safety in RL, a systematic understanding of the field remains difficult.
arXiv Detail & Related papers (2024-02-03T04:40:31Z) - Provable Offline Preference-Based Reinforcement Learning [95.00042541409901]
We investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback.
We consider the general reward setting where the reward can be defined over the whole trajectory.
We introduce a new single-policy concentrability coefficient, which can be upper bounded by the per-trajectory concentrability.
arXiv Detail & Related papers (2023-05-24T07:11:26Z) - Safe Reinforcement Learning From Pixels Using a Stochastic Latent
Representation [3.5884936187733394]
We address the problem of safe reinforcement learning from pixel observations.
We formalize the problem in a constrained, partially observable Markov decision process framework.
We employ a novel safety critic using the latent actor-critic (SLAC) approach.
arXiv Detail & Related papers (2022-10-02T19:55:42Z) - Your Policy Regularizer is Secretly an Adversary [13.625408555732752]
We show how robustness arises from hedging against worst-case perturbations of the reward function.
We characterize this robust set of adversarial reward perturbations under KL and alpha-divergence regularization.
We provide detailed discussion of the worst-case reward perturbations, and present intuitive empirical examples to illustrate this robustness.
arXiv Detail & Related papers (2022-03-23T17:54:20Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z) - Regularized Inverse Reinforcement Learning [49.78352058771138]
Inverse Reinforcement Learning (IRL) aims to facilitate a learner's ability to imitate expert behavior.
Regularized IRL applies strongly convex regularizers to the learner's policy.
We propose tractable solutions, and practical methods to obtain them, for regularized IRL.
arXiv Detail & Related papers (2020-10-07T23:38:47Z) - Corruption-robust exploration in episodic reinforcement learning [76.19192549843727]
We study multi-stage episodic reinforcement learning under adversarial corruptions in both the rewards and the transition probabilities of the underlying system.
Our framework yields efficient algorithms which attain near-optimal regret in the absence of corruptions.
Notably, our work provides the first sublinear regret guarantee which any deviation from purely i.i.d. transitions in the bandit-feedback model for episodic reinforcement learning.
arXiv Detail & Related papers (2019-11-20T03:49:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.