Efficient Action Robust Reinforcement Learning with Probabilistic Policy
Execution Uncertainty
- URL: http://arxiv.org/abs/2307.07666v2
- Date: Thu, 20 Jul 2023 07:55:04 GMT
- Title: Efficient Action Robust Reinforcement Learning with Probabilistic Policy
Execution Uncertainty
- Authors: Guanlin Liu, Zhihan Zhou, Han Liu, Lifeng Lai
- Abstract summary: In this paper, we focus on action robust RL with the probabilistic policy execution uncertainty.
We establish the existence of an optimal policy on the action robust MDPs with probabilistic policy execution uncertainty.
We also develop Action Robust Reinforcement Learning with Certificates (ARRLC) algorithm that minimax optimal regret and sample complexity.
- Score: 43.55450683502937
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robust reinforcement learning (RL) aims to find a policy that optimizes the
worst-case performance in the face of uncertainties. In this paper, we focus on
action robust RL with the probabilistic policy execution uncertainty, in which,
instead of always carrying out the action specified by the policy, the agent
will take the action specified by the policy with probability $1-\rho$ and an
alternative adversarial action with probability $\rho$. We establish the
existence of an optimal policy on the action robust MDPs with probabilistic
policy execution uncertainty and provide the action robust Bellman optimality
equation for its solution. Furthermore, we develop Action Robust Reinforcement
Learning with Certificates (ARRLC) algorithm that achieves minimax optimal
regret and sample complexity. Furthermore, we conduct numerical experiments to
validate our approach's robustness, demonstrating that ARRLC outperforms
non-robust RL algorithms and converges faster than the robust TD algorithm in
the presence of action perturbations.
Related papers
- Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Natural Actor-Critic for Robust Reinforcement Learning with Function
Approximation [20.43657369407846]
We study robust reinforcement learning (RL) with the goal of determining a well-performing policy that is robust against model mismatch between the training simulator and the testing environment.
We propose two novel uncertainty set formulations, one based on double sampling and the other on an integral probability metric.
We demonstrate the robust performance of the policy learned by our proposed RNAC approach in multiple MuJoCo environments and a real-world TurtleBot navigation task.
arXiv Detail & Related papers (2023-07-17T22:10:20Z) - Provably Efficient Iterated CVaR Reinforcement Learning with Function
Approximation and Human Feedback [57.6775169085215]
Risk-sensitive reinforcement learning aims to optimize policies that balance the expected reward and risk.
We present a novel framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations.
We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis.
arXiv Detail & Related papers (2023-07-06T08:14:54Z) - Robust Risk-Aware Reinforcement Learning [0.0]
We present a reinforcement learning (RL) approach for robust optimisation of risk-aware performance criteria.
We assess the value of a policy using rank dependent expected utility (RDEU)
To robustify optimal policies against model uncertainty, we assess a policy not by its distribution, but by the worst possible distribution that lies within a Wasserstein ball around it.
arXiv Detail & Related papers (2021-08-23T20:56:34Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z) - Provably Correct Optimization and Exploration with Non-linear Policies [65.60853260886516]
ENIAC is an actor-critic method that allows non-linear function approximation in the critic.
We show that under certain assumptions, the learner finds a near-optimal policy in $O(poly(d))$ exploration rounds.
We empirically evaluate this adaptation and show that it outperforms priors inspired by linear methods.
arXiv Detail & Related papers (2021-03-22T03:16:33Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Bayesian Robust Optimization for Imitation Learning [34.40385583372232]
Inverse reinforcement learning can enable generalization to new states by learning a parameterized reward function.
Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework.
BROIL provides a natural way to interpolate between return-maximizing and risk-minimizing behaviors.
arXiv Detail & Related papers (2020-07-24T01:52:11Z) - Robust Reinforcement Learning using Least Squares Policy Iteration with
Provable Performance Guarantees [3.8073142980733]
This paper addresses the problem of model-free reinforcement learning for Robust Markov Decision Process (RMDP) with large state spaces.
We first propose the Robust Least Squares Policy Evaluation algorithm, which is a multi-step online model-free learning algorithm for policy evaluation.
We then propose Robust Least Squares Policy Iteration (RLSPI) algorithm for learning the optimal robust policy.
arXiv Detail & Related papers (2020-06-20T16:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.