Regret Bounds for Markov Decision Processes with Recursive Optimized
Certainty Equivalents
- URL: http://arxiv.org/abs/2301.12601v2
- Date: Thu, 8 Jun 2023 07:47:46 GMT
- Title: Regret Bounds for Markov Decision Processes with Recursive Optimized
Certainty Equivalents
- Authors: Wenhao Xu, Xuefeng Gao, Xuedong He
- Abstract summary: We propose a new episodic risk-sensitive reinforcement learning formulation.
We design an efficient learning algorithm for this problem based on value iteration and upper confidence bound.
Our bounds show that the regret rate achieved by our proposed algorithm has optimal dependence on the number of episodes and the number of actions.
- Score: 3.8980564330208662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The optimized certainty equivalent (OCE) is a family of risk measures that
cover important examples such as entropic risk, conditional value-at-risk and
mean-variance models. In this paper, we propose a new episodic risk-sensitive
reinforcement learning formulation based on tabular Markov decision processes
with recursive OCEs. We design an efficient learning algorithm for this problem
based on value iteration and upper confidence bound. We derive an upper bound
on the regret of the proposed algorithm, and also establish a minimax lower
bound. Our bounds show that the regret rate achieved by our proposed algorithm
has optimal dependence on the number of episodes and the number of actions.
Related papers
- Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Risk-aware linear bandits with convex loss [0.0]
We propose an optimistic UCB algorithm to learn optimal risk-aware actions, with regret guarantees similar to those of generalized linear bandits.
This approach requires solving a convex problem at each round of the algorithm, which we can relax by allowing only approximated solution obtained by online gradient descent.
arXiv Detail & Related papers (2022-09-15T09:09:53Z) - On the Complexity of Adversarial Decision Making [101.14158787665252]
We show that the Decision-Estimation Coefficient is necessary and sufficient to obtain low regret for adversarial decision making.
We provide new structural results that connect the Decision-Estimation Coefficient to variants of other well-known complexity measures.
arXiv Detail & Related papers (2022-06-27T06:20:37Z) - Risk-aware Stochastic Shortest Path [0.0]
We treat the problem of risk-aware control for shortest path (SSP) on Markov decision processes (MDP)
We present an alternative view, instead optimizing conditional value-at-risk (CVaR), an established risk measure.
arXiv Detail & Related papers (2022-03-03T10:59:54Z) - A policy gradient approach for optimization of smooth risk measures [8.087699764574788]
We consider episodic Markov decision processes, and model the risk using the broad class of smooth risk measures of the cumulative discounted reward.
We propose two template policy gradient algorithms that optimize a smooth risk measure in on-policy and off-policy RL settings.
arXiv Detail & Related papers (2022-02-22T17:26:28Z) - A Regret Minimization Approach to Iterative Learning Control [61.37088759497583]
We propose a new performance metric, planning regret, which replaces the standard uncertainty assumptions with worst case regret.
We provide theoretical and empirical evidence that the proposed algorithm outperforms existing methods on several benchmarks.
arXiv Detail & Related papers (2021-02-26T13:48:49Z) - Constrained Risk-Averse Markov Decision Processes [18.467950783426947]
We consider the problem of designing policies for Markov decision processes with dynamic coherent risk objectives and constraints.
We propose an optimization-based method to synthesize Markovian policies that lower-bound the constrained risk-averse problem.
We show that these results generalize linear programs for constrained MDPs with total discounted expected costs and constraints.
arXiv Detail & Related papers (2020-12-04T06:12:11Z) - Risk-Constrained Thompson Sampling for CVaR Bandits [82.47796318548306]
We consider a popular risk measure in quantitative finance known as the Conditional Value at Risk (CVaR)
We explore the performance of a Thompson Sampling-based algorithm CVaR-TS under this risk measure.
arXiv Detail & Related papers (2020-11-16T15:53:22Z) - Learning Bounds for Risk-sensitive Learning [86.50262971918276]
In risk-sensitive learning, one aims to find a hypothesis that minimizes a risk-averse (or risk-seeking) measure of loss.
We study the generalization properties of risk-sensitive learning schemes whose optimand is described via optimized certainty equivalents.
arXiv Detail & Related papers (2020-06-15T05:25:02Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.