Reinforcement Learning with Dynamic Convex Risk Measures
- URL: http://arxiv.org/abs/2112.13414v1
- Date: Sun, 26 Dec 2021 16:41:05 GMT
- Title: Reinforcement Learning with Dynamic Convex Risk Measures
- Authors: Anthony Coache and Sebastian Jaimungal
- Abstract summary: We develop an approach for solving time-consistent risk-sensitive optimization problems using model-free reinforcement learning (RL)
We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We develop an approach for solving time-consistent risk-sensitive stochastic
optimization problems using model-free reinforcement learning (RL).
Specifically, we assume agents assess the risk of a sequence of random
variables using dynamic convex risk measures. We employ a time-consistent
dynamic programming principle to determine the value of a particular policy,
and develop policy gradient update rules. We further develop an actor-critic
style algorithm using neural networks to optimize over policies. Finally, we
demonstrate the performance and flexibility of our approach by applying it to
optimization problems in statistical arbitrage trading and obstacle avoidance
robot control.
Related papers
- Robust Reinforcement Learning with Dynamic Distortion Risk Measures [0.0]
We devise a framework to solve robust risk-aware reinforcement learning problems.
We simultaneously account for environmental uncertainty and risk with a class of dynamic robust distortion risk measures.
We construct an actor-critic algorithm to solve this class of robust risk-aware RL problems.
arXiv Detail & Related papers (2024-09-16T08:54:59Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Provably Efficient Iterated CVaR Reinforcement Learning with Function
Approximation and Human Feedback [57.6775169085215]
Risk-sensitive reinforcement learning aims to optimize policies that balance the expected reward and risk.
We present a novel framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations.
We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis.
arXiv Detail & Related papers (2023-07-06T08:14:54Z) - Risk-Sensitive Reinforcement Learning with Exponential Criteria [0.0]
We provide a definition of robust reinforcement learning policies and formulate a risk-sensitive reinforcement learning problem to approximate them.
We introduce a novel online Actor-Critic algorithm based on solving a multiplicative Bellman equation using approximation updates.
The implementation, performance, and robustness properties of the proposed methods are evaluated in simulated experiments.
arXiv Detail & Related papers (2022-12-18T04:44:38Z) - Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement
Learning [0.0]
We develop an efficient approach to estimate a class of dynamic spectral risk measures with deep neural networks.
We also develop a risk-sensitive actor-critic algorithm that uses full episodes and does not require any additional nested transitions.
arXiv Detail & Related papers (2022-06-29T14:11:15Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Risk Conditioned Neural Motion Planning [14.018786843419862]
Risk-bounded motion planning is an important yet difficult problem for safety-critical tasks.
We propose an extension of soft actor critic model to estimate the execution risk of a plan through a risk critic.
We show the advantage of our model in terms of both computational time and plan quality, compared to a state-of-the-art mathematical programming baseline.
arXiv Detail & Related papers (2021-08-04T05:33:52Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Entropic Risk Constrained Soft-Robust Policy Optimization [12.362670630646805]
It is important in high-stakes domains to quantify and manage risk induced by model uncertainties.
We propose an entropic risk constrained policy gradient and actor-critic algorithms that are risk-averse to the model uncertainty.
arXiv Detail & Related papers (2020-06-20T23:48:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.