Distributional Method for Risk Averse Reinforcement Learning
- URL: http://arxiv.org/abs/2302.14109v1
- Date: Mon, 27 Feb 2023 19:48:42 GMT
- Title: Distributional Method for Risk Averse Reinforcement Learning
- Authors: Ziteng Cheng, Sebastian Jaimungal and Nick Martin
- Abstract summary: We introduce a distributional method for learning the optimal policy in risk averse Markov decision process.
We assume sequential observations of states, actions, and costs and assess the performance of a policy using dynamic risk measures.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a distributional method for learning the optimal policy in risk
averse Markov decision process with finite state action spaces, latent costs,
and stationary dynamics. We assume sequential observations of states, actions,
and costs and assess the performance of a policy using dynamic risk measures
constructed from nested Kusuoka-type conditional risk mappings. For such
performance criteria, randomized policies may outperform deterministic
policies, therefore, the candidate policies lie in the d-dimensional simplex
where d is the cardinality of the action space. Existing risk averse
reinforcement learning methods seldom concern randomized policies, na\"ive
extensions to current setting suffer from the curse of dimensionality. By
exploiting certain structures embedded in the corresponding dynamic programming
principle, we propose a distributional learning method for seeking the optimal
policy. The conditional distribution of the value function is casted into a
specific type of function, which is chosen with in mind the ease of risk averse
optimization. We use a deep neural network to approximate said function,
illustrate that the proposed method avoids the curse of dimensionality in the
exploration phase, and explore the method's performance with a wide range of
model parameters that are picked randomly.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Risk-Sensitive Reinforcement Learning with Exponential Criteria [0.0]
We provide a definition of robust reinforcement learning policies and formulate a risk-sensitive reinforcement learning problem to approximate them.
We introduce a novel online Actor-Critic algorithm based on solving a multiplicative Bellman equation using approximation updates.
The implementation, performance, and robustness properties of the proposed methods are evaluated in simulated experiments.
arXiv Detail & Related papers (2022-12-18T04:44:38Z) - A Risk-Sensitive Approach to Policy Optimization [21.684251937825234]
Standard deep reinforcement learning (DRL) aims to maximize expected reward, considering collected experiences equally in formulating a policy.
We propose a more direct approach whereby risk-sensitive objectives, specified in terms of the cumulative distribution function (CDF) of the distribution of full-episode rewards, are optimized.
We demonstrate that the use of moderately "pessimistic" risk profiles, which emphasize scenarios where the agent performs poorly, leads to enhanced exploration and a continual focus on addressing deficiencies.
arXiv Detail & Related papers (2022-08-19T00:55:05Z) - Randomized Policy Optimization for Optimal Stopping [0.0]
We propose a new methodology for optimal stopping based on randomized linear policies.
We show that our approach can substantially outperform state-of-the-art methods.
arXiv Detail & Related papers (2022-03-25T04:33:15Z) - Off-Policy Evaluation with Policy-Dependent Optimization Response [90.28758112893054]
We develop a new framework for off-policy evaluation with a textitpolicy-dependent linear optimization response.
We construct unbiased estimators for the policy-dependent estimand by a perturbation method.
We provide a general algorithm for optimizing causal interventions.
arXiv Detail & Related papers (2022-02-25T20:25:37Z) - Reinforcement Learning with Dynamic Convex Risk Measures [0.0]
We develop an approach for solving time-consistent risk-sensitive optimization problems using model-free reinforcement learning (RL)
We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules.
arXiv Detail & Related papers (2021-12-26T16:41:05Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Cautious Reinforcement Learning via Distributional Risk in the Dual
Domain [45.17200683056563]
We study the estimation of risk-sensitive policies in reinforcement learning problems defined by a Markov Decision Process (MDPs) whose state and action spaces are countably finite.
We propose a new definition of risk, which we call caution, as a penalty function added to the dual objective of the linear programming (LP) formulation of reinforcement learning.
arXiv Detail & Related papers (2020-02-27T23:18:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.