Inverse Reinforcement Learning with Unknown Reward Model based on
Structural Risk Minimization
- URL: http://arxiv.org/abs/2312.16566v1
- Date: Wed, 27 Dec 2023 13:23:17 GMT
- Title: Inverse Reinforcement Learning with Unknown Reward Model based on
Structural Risk Minimization
- Authors: Chendi Qu, Jianping He, Xiaoming Duan, Jiming Chen
- Abstract summary: Inverse reinforcement learning (IRL) usually assumes the model of the reward function is pre-specified and estimates the parameter only.
A simplistic model is less likely to contain the real reward function, while a model with high complexity leads to substantial cost and risks overfitting.
This paper addresses this trade-off by introducing the structural risk minimization (SRM) method from statistical learning.
- Score: 9.44879308639364
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inverse reinforcement learning (IRL) usually assumes the model of the reward
function is pre-specified and estimates the parameter only. However, how to
determine a proper reward model is nontrivial. A simplistic model is less
likely to contain the real reward function, while a model with high complexity
leads to substantial computation cost and risks overfitting. This paper
addresses this trade-off in IRL model selection by introducing the structural
risk minimization (SRM) method from statistical learning. SRM selects an
optimal reward function class from a hypothesis set minimizing both estimation
error and model complexity. To formulate an SRM scheme for IRL, we estimate
policy gradient by demonstration serving as empirical risk and establish the
upper bound of Rademacher complexity of hypothesis classes as model penalty.
The learning guarantee is further presented. In particular, we provide explicit
SRM for the common linear weighted sum setting in IRL. Simulations demonstrate
the performance and efficiency of our scheme.
Related papers
- Towards Reliable Alignment: Uncertainty-aware RLHF [14.20181662644689]
We show that the fluctuation of reward models can be detrimental to the alignment problem.
We show that such policies are more risk-averse in the sense that they are more cautious of uncertain rewards.
We use this ensemble of reward models to align language model using our methodology and observe that our empirical findings match our theoretical predictions.
arXiv Detail & Related papers (2024-10-31T08:26:51Z) - Model Selection Through Model Sorting [1.534667887016089]
We propose a model order selection method called nested empirical risk (NER)
In the UCR data set, the NER method reduces the complexity of the classification of UCR datasets dramatically.
arXiv Detail & Related papers (2024-09-15T09:43:59Z) - Invariant Risk Minimization Is A Total Variation Model [3.000494957386027]
Invariant risk minimization (IRM) is an arising approach to generalize invariant features to different environments in machine learning.
We show that IRM is essentially a total variation based on $L2$ (TV-$ell$) of the learning risk.
We propose a novel IRM framework based on the TV-$ell$ model.
arXiv Detail & Related papers (2024-05-02T15:34:14Z) - Provable Risk-Sensitive Distributional Reinforcement Learning with
General Function Approximation [54.61816424792866]
We introduce a general framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL), with static Lipschitz Risk Measures (LRM) and general function approximation.
We design two innovative meta-algorithms: textttRS-DisRL-M, a model-based strategy for model-based function approximation, and textttRS-DisRL-V, a model-free approach for general value function approximation.
arXiv Detail & Related papers (2024-02-28T08:43:18Z) - COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically
for Model-Based RL [50.385005413810084]
Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration.
$textttCOPlanner$ is a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem.
arXiv Detail & Related papers (2023-10-11T06:10:07Z) - On the Variance, Admissibility, and Stability of Empirical Risk
Minimization [80.26309576810844]
Empirical Risk Minimization (ERM) with squared loss may attain minimax suboptimal error rates.
We show that under mild assumptions, the suboptimality of ERM must be due to large bias rather than variance.
We also show that our estimates imply stability of ERM, complementing the main result of Caponnetto and Rakhlin (2006) for non-Donsker classes.
arXiv Detail & Related papers (2023-05-29T15:25:48Z) - A Model-Based Method for Minimizing CVaR and Beyond [7.751691910877239]
We develop a variant of the prox-linear method for minimizing the Conditional Value-at-Risk (CVaR) objective.
CVaR is a risk measure focused on minimizing worst-case performance, defined as the average of the top quantile of the losses.
In machine learning, such a risk measure is useful to train more robust models.
arXiv Detail & Related papers (2023-05-27T15:38:53Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - On the Minimal Error of Empirical Risk Minimization [90.09093901700754]
We study the minimal error of the Empirical Risk Minimization (ERM) procedure in the task of regression.
Our sharp lower bounds shed light on the possibility (or impossibility) of adapting to simplicity of the model generating the data.
arXiv Detail & Related papers (2021-02-24T04:47:55Z) - Model-Augmented Q-learning [112.86795579978802]
We propose a MFRL framework that is augmented with the components of model-based RL.
Specifically, we propose to estimate not only the $Q$-values but also both the transition and the reward with a shared network.
We show that the proposed scheme, called Model-augmented $Q$-learning (MQL), obtains a policy-invariant solution which is identical to the solution obtained by learning with true reward.
arXiv Detail & Related papers (2021-02-07T17:56:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.