Robust Risk-Aware Reinforcement Learning
- URL: http://arxiv.org/abs/2108.10403v1
- Date: Mon, 23 Aug 2021 20:56:34 GMT
- Title: Robust Risk-Aware Reinforcement Learning
- Authors: Sebastian Jaimungal, Silvana Pesenti, Ye Sheng Wang, and Hariom Tatsat
- Abstract summary: We present a reinforcement learning (RL) approach for robust optimisation of risk-aware performance criteria.
We assess the value of a policy using rank dependent expected utility (RDEU)
To robustify optimal policies against model uncertainty, we assess a policy not by its distribution, but by the worst possible distribution that lies within a Wasserstein ball around it.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present a reinforcement learning (RL) approach for robust optimisation of
risk-aware performance criteria. To allow agents to express a wide variety of
risk-reward profiles, we assess the value of a policy using rank dependent
expected utility (RDEU). RDEU allows the agent to seek gains, while
simultaneously protecting themselves against downside events. To robustify
optimal policies against model uncertainty, we assess a policy not by its
distribution, but rather, by the worst possible distribution that lies within a
Wasserstein ball around it. Thus, our problem formulation may be viewed as an
actor choosing a policy (the outer problem), and the adversary then acting to
worsen the performance of that strategy (the inner problem). We develop
explicit policy gradient formulae for the inner and outer problems, and show
its efficacy on three prototypical financial problems: robust portfolio
allocation, optimising a benchmark, and statistical arbitrage
Related papers
- Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Efficient Action Robust Reinforcement Learning with Probabilistic Policy
Execution Uncertainty [43.55450683502937]
In this paper, we focus on action robust RL with the probabilistic policy execution uncertainty.
We establish the existence of an optimal policy on the action robust MDPs with probabilistic policy execution uncertainty.
We also develop Action Robust Reinforcement Learning with Certificates (ARRLC) algorithm that minimax optimal regret and sample complexity.
arXiv Detail & Related papers (2023-07-15T00:26:51Z) - Robust Risk-Aware Option Hedging [2.405471533561618]
We showcase the potential of robust risk-aware reinforcement learning (RL) in mitigating the risks associated with path-dependent financial derivatives.
We apply this methodology to the hedging of barrier options, and highlight how the optimal hedging strategy undergoes distortions as the agent moves from being risk-averse to risk-seeking.
arXiv Detail & Related papers (2023-03-27T13:57:13Z) - A Risk-Sensitive Approach to Policy Optimization [21.684251937825234]
Standard deep reinforcement learning (DRL) aims to maximize expected reward, considering collected experiences equally in formulating a policy.
We propose a more direct approach whereby risk-sensitive objectives, specified in terms of the cumulative distribution function (CDF) of the distribution of full-episode rewards, are optimized.
We demonstrate that the use of moderately "pessimistic" risk profiles, which emphasize scenarios where the agent performs poorly, leads to enhanced exploration and a continual focus on addressing deficiencies.
arXiv Detail & Related papers (2022-08-19T00:55:05Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z) - Risk-Averse Offline Reinforcement Learning [46.383648750385575]
Training Reinforcement Learning (RL) agents in high-stakes applications might be too prohibitive due to the risk associated to exploration.
We present the Offline Risk-Averse Actor-Critic (O-RAAC), a model-free RL algorithm that is able to learn risk-averse policies in a fully offline setting.
arXiv Detail & Related papers (2021-02-10T10:27:49Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - Improving Robustness via Risk Averse Distributional Reinforcement
Learning [13.467017642143581]
Robustness is critical when the policies are trained in simulations instead of real world environment.
We propose a risk-aware algorithm to learn robust policies in order to bridge the gap between simulation training and real-world implementation.
arXiv Detail & Related papers (2020-05-01T20:03:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.