Regret Bounds for Risk-sensitive Reinforcement Learning with Lipschitz
Dynamic Risk Measures
- URL: http://arxiv.org/abs/2306.02399v1
- Date: Sun, 4 Jun 2023 16:24:19 GMT
- Title: Regret Bounds for Risk-sensitive Reinforcement Learning with Lipschitz
Dynamic Risk Measures
- Authors: Hao Liang, Zhi-quan Luo
- Abstract summary: We present two model-based algorithms applied to emphLipschitz dynamic risk measures.
Notably, our upper bounds demonstrate optimal dependencies on the number of actions and episodes.
- Score: 23.46659319363579
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study finite episodic Markov decision processes incorporating dynamic risk
measures to capture risk sensitivity. To this end, we present two model-based
algorithms applied to \emph{Lipschitz} dynamic risk measures, a wide range of
risk measures that subsumes spectral risk measure, optimized certainty
equivalent, distortion risk measures among others. We establish both regret
upper bounds and lower bounds. Notably, our upper bounds demonstrate optimal
dependencies on the number of actions and episodes, while reflecting the
inherent trade-off between risk sensitivity and sample complexity.
Additionally, we substantiate our theoretical results through numerical
experiments.
Related papers
- Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Capsa: A Unified Framework for Quantifying Risk in Deep Neural Networks [142.67349734180445]
Existing algorithms that provide risk-awareness to deep neural networks are complex and ad-hoc.
Here we present capsa, a framework for extending models with risk-awareness.
arXiv Detail & Related papers (2023-08-01T02:07:47Z) - Risk-Averse Reinforcement Learning via Dynamic Time-Consistent Risk
Measures [10.221369785560785]
In this paper, we consider the problem of maximizing dynamic risk of a sequence of rewards in Markov Decision Processes (MDPs)
Using a convex combination of expectation and conditional value-at-risk (CVaR) as a special one-step conditional risk measure, we reformulate the risk-averse MDP as a risk-neutral counterpart with augmented action space and manipulation on the immediate rewards.
Our numerical studies show that the risk-averse setting can reduce the variance and enhance robustness of the results.
arXiv Detail & Related papers (2023-01-14T21:43:18Z) - RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk [28.811725782388688]
We propose and analyze a new framework to jointly model the risk associated with uncertainties in finite-horizon and discounted infinite-horizon MDPs.
We show that when the risk-aversion is defined using either EVaR or the entropic risk, the optimal policy in RASR can be computed efficiently using a new dynamic program formulation with a time-dependent risk level.
arXiv Detail & Related papers (2022-09-09T00:34:58Z) - Deep Learning for Systemic Risk Measures [3.274367403737527]
The aim of this paper is to study a new methodological framework for systemic risk measures.
Under this new framework, systemic risk measures can be interpreted as the minimal amount of cash that secures the aggregated system.
Deep learning is increasingly receiving attention in financial modelings and risk management.
arXiv Detail & Related papers (2022-07-02T05:01:19Z) - Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement
Learning [0.0]
We develop an efficient approach to estimate a class of dynamic spectral risk measures with deep neural networks.
We also develop a risk-sensitive actor-critic algorithm that uses full episodes and does not require any additional nested transitions.
arXiv Detail & Related papers (2022-06-29T14:11:15Z) - A Survey of Risk-Aware Multi-Armed Bandits [84.67376599822569]
We review various risk measures of interest, and comment on their properties.
We consider algorithms for the regret minimization setting, where the exploration-exploitation trade-off manifests.
We conclude by commenting on persisting challenges and fertile areas for future research.
arXiv Detail & Related papers (2022-05-12T02:20:34Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Risk-Constrained Thompson Sampling for CVaR Bandits [82.47796318548306]
We consider a popular risk measure in quantitative finance known as the Conditional Value at Risk (CVaR)
We explore the performance of a Thompson Sampling-based algorithm CVaR-TS under this risk measure.
arXiv Detail & Related papers (2020-11-16T15:53:22Z) - Entropic Risk Constrained Soft-Robust Policy Optimization [12.362670630646805]
It is important in high-stakes domains to quantify and manage risk induced by model uncertainties.
We propose an entropic risk constrained policy gradient and actor-critic algorithms that are risk-averse to the model uncertainty.
arXiv Detail & Related papers (2020-06-20T23:48:28Z) - Learning Bounds for Risk-sensitive Learning [86.50262971918276]
In risk-sensitive learning, one aims to find a hypothesis that minimizes a risk-averse (or risk-seeking) measure of loss.
We study the generalization properties of risk-sensitive learning schemes whose optimand is described via optimized certainty equivalents.
arXiv Detail & Related papers (2020-06-15T05:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.