Robust Imitation via Mirror Descent Inverse Reinforcement Learning
- URL: http://arxiv.org/abs/2210.11201v1
- Date: Thu, 20 Oct 2022 12:25:21 GMT
- Title: Robust Imitation via Mirror Descent Inverse Reinforcement Learning
- Authors: Dong-Sig Han, Hyunseo Kim, Hyundo Lee, Je-Hwan Ryu, Byoung-Tak Zhang
- Abstract summary: This paper proposes to predict a sequence of reward functions, which are iterative solutions for a constrained convex problem.
We prove that the proposed mirror descent update rule ensures robust minimization of a Bregman divergence.
Our IRL method was applied on top of an adversarial framework, and it outperformed existing adversarial methods in an extensive suite of benchmarks.
- Score: 18.941048578572577
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, adversarial imitation learning has shown a scalable reward
acquisition method for inverse reinforcement learning (IRL) problems. However,
estimated reward signals often become uncertain and fail to train a reliable
statistical model since the existing methods tend to solve hard optimization
problems directly. Inspired by a first-order optimization method called mirror
descent, this paper proposes to predict a sequence of reward functions, which
are iterative solutions for a constrained convex problem. IRL solutions derived
by mirror descent are tolerant to the uncertainty incurred by target density
estimation since the amount of reward learning is regulated with respect to
local geometric constraints. We prove that the proposed mirror descent update
rule ensures robust minimization of a Bregman divergence in terms of a rigorous
regret bound of $\mathcal{O}(1/T)$ for step sizes $\{\eta_t\}_{t=1}^{T}$. Our
IRL method was applied on top of an adversarial framework, and it outperformed
existing adversarial methods in an extensive suite of benchmarks.
Related papers
- Improving Diffusion Models for Inverse Problems Using Optimal Posterior Covariance [52.093434664236014]
Recent diffusion models provide a promising zero-shot solution to noisy linear inverse problems without retraining for specific inverse problems.
Inspired by this finding, we propose to improve recent methods by using more principled covariance determined by maximum likelihood estimation.
arXiv Detail & Related papers (2024-02-03T13:35:39Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time
Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy.
Many algorithms for IRL have an inherently nested structure.
We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z) - Online Contextual Decision-Making with a Smart Predict-then-Optimize
Method [4.061135251278187]
We study an online contextual decision-making problem with resource constraints.
We propose an algorithm that mixes a prediction step based on the "Smart Predict-then- (SPO)" method with a dual update step based on mirror descent.
We prove regret bounds and demonstrate that the overall convergence rate of our method depends on the $mathcalO(T-1/2)$ convergence of online mirror descent.
arXiv Detail & Related papers (2022-06-15T06:16:13Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - A Generalised Inverse Reinforcement Learning Framework [24.316047317028147]
inverse Reinforcement Learning (IRL) is to estimate the unknown cost function of some MDP base on observed trajectories.
We introduce an alternative training loss that puts more weights on future states which yields a reformulation of the (maximum entropy) IRL problem.
The algorithms we devised exhibit enhanced performances (and similar tractability) than off-the-shelf ones in multiple OpenAI gym environments.
arXiv Detail & Related papers (2021-05-25T10:30:45Z) - Online and Distribution-Free Robustness: Regression and Contextual
Bandits with Huber Contamination [29.85468294601847]
We revisit two classic high-dimensional online learning problems, namely linear regression and contextual bandits.
We show that our algorithms succeed where conventional methods fail.
arXiv Detail & Related papers (2020-10-08T17:59:05Z) - Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms.
We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework.
Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z) - Sparse recovery by reduced variance stochastic approximation [5.672132510411465]
We discuss application of iterative quadratic optimization routines to the problem of sparse signal recovery from noisy observation.
We show how one can straightforwardly enhance reliability of the corresponding solution by using Median-of-Means like techniques.
arXiv Detail & Related papers (2020-06-11T12:31:20Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z) - The Statistical Complexity of Early-Stopped Mirror Descent [27.393821783237186]
We study the statistical guarantees on the excess risk achieved by early-stopped unconstrained mirror descent algorithms.
By completing an inequality that characterizes convexity for the squared loss, we identify an intrinsic link between offset Rademacher complexities and potential-based convergence analysis of mirror descent methods.
arXiv Detail & Related papers (2020-02-01T11:05:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.