Bayesian Robust Optimization for Imitation Learning
- URL: http://arxiv.org/abs/2007.12315v4
- Date: Fri, 1 Mar 2024 04:31:22 GMT
- Title: Bayesian Robust Optimization for Imitation Learning
- Authors: Daniel S. Brown, Scott Niekum, Marek Petrik
- Abstract summary: Inverse reinforcement learning can enable generalization to new states by learning a parameterized reward function.
Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework.
BROIL provides a natural way to interpolate between return-maximizing and risk-minimizing behaviors.
- Score: 34.40385583372232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the main challenges in imitation learning is determining what action
an agent should take when outside the state distribution of the demonstrations.
Inverse reinforcement learning (IRL) can enable generalization to new states by
learning a parameterized reward function, but these approaches still face
uncertainty over the true reward function and corresponding optimal policy.
Existing safe imitation learning approaches based on IRL deal with this
uncertainty using a maxmin framework that optimizes a policy under the
assumption of an adversarial reward function, whereas risk-neutral IRL
approaches either optimize a policy for the mean or MAP reward function. While
completely ignoring risk can lead to overly aggressive and unsafe policies,
optimizing in a fully adversarial sense is also problematic as it can lead to
overly conservative policies that perform poorly in practice. To provide a
bridge between these two extremes, we propose Bayesian Robust Optimization for
Imitation Learning (BROIL). BROIL leverages Bayesian reward function inference
and a user specific risk tolerance to efficiently optimize a robust policy that
balances expected return and conditional value at risk. Our empirical results
show that BROIL provides a natural way to interpolate between return-maximizing
and risk-minimizing behaviors and outperforms existing risk-sensitive and
risk-neutral inverse reinforcement learning algorithms. Code is available at
https://github.com/dsbrown1331/broil.
Related papers
- Proximal Ranking Policy Optimization for Practical Safety in Counterfactual Learning to Rank [64.44255178199846]
We propose a novel approach, proximal ranking policy optimization (PRPO), that provides safety in deployment without assumptions about user behavior.
PRPO removes incentives for learning ranking behavior that is too dissimilar to a safe ranking model.
Our experiments show that PRPO provides higher performance than the existing safe inverse propensity scoring approach.
arXiv Detail & Related papers (2024-09-15T22:22:27Z) - Efficient Action Robust Reinforcement Learning with Probabilistic Policy
Execution Uncertainty [43.55450683502937]
In this paper, we focus on action robust RL with the probabilistic policy execution uncertainty.
We establish the existence of an optimal policy on the action robust MDPs with probabilistic policy execution uncertainty.
We also develop Action Robust Reinforcement Learning with Certificates (ARRLC) algorithm that minimax optimal regret and sample complexity.
arXiv Detail & Related papers (2023-07-15T00:26:51Z) - Coherent Soft Imitation Learning [17.345411907902932]
Imitation learning methods seek to learn from an expert either through behavioral cloning (BC) of the policy or inverse reinforcement learning (IRL) of the reward.
This work derives an imitation method that captures the strengths of both BC and IRL.
arXiv Detail & Related papers (2023-05-25T21:54:22Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Model-based Safe Deep Reinforcement Learning via a Constrained Proximal
Policy Optimization Algorithm [4.128216503196621]
We propose an On-policy Model-based Safe Deep RL algorithm in which we learn the transition dynamics of the environment in an online manner.
We show that our algorithm is more sample efficient and results in lower cumulative hazard violations as compared to constrained model-free approaches.
arXiv Detail & Related papers (2022-10-14T06:53:02Z) - A Risk-Sensitive Approach to Policy Optimization [21.684251937825234]
Standard deep reinforcement learning (DRL) aims to maximize expected reward, considering collected experiences equally in formulating a policy.
We propose a more direct approach whereby risk-sensitive objectives, specified in terms of the cumulative distribution function (CDF) of the distribution of full-episode rewards, are optimized.
We demonstrate that the use of moderately "pessimistic" risk profiles, which emphasize scenarios where the agent performs poorly, leads to enhanced exploration and a continual focus on addressing deficiencies.
arXiv Detail & Related papers (2022-08-19T00:55:05Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Understanding the Effect of Stochasticity in Policy Optimization [86.7574122154668]
We show that the preferability of optimization methods depends critically on whether exact gradients are used.
Second, to explain these findings we introduce the concept of committal rate for policy optimization.
Third, we show that in the absence of external oracle information, there is an inherent trade-off between exploiting geometry to accelerate convergence versus achieving optimality almost surely.
arXiv Detail & Related papers (2021-10-29T06:35:44Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.