Cascaded Gaps: Towards Gap-Dependent Regret for Risk-Sensitive
Reinforcement Learning
- URL: http://arxiv.org/abs/2203.03110v1
- Date: Mon, 7 Mar 2022 03:07:09 GMT
- Title: Cascaded Gaps: Towards Gap-Dependent Regret for Risk-Sensitive
Reinforcement Learning
- Authors: Yingjie Fei, Ruitu Xu
- Abstract summary: We study gap-dependent regret guarantees for risk-sensitive reinforcement learning based on the entropic risk measure.
We derive non-asymptotic and logarithmic regret bounds for two model-free algorithms under episodic Markov decision processes.
- Score: 14.036298712230233
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study gap-dependent regret guarantees for risk-sensitive
reinforcement learning based on the entropic risk measure. We propose a novel
definition of sub-optimality gaps, which we call cascaded gaps, and we discuss
their key components that adapt to the underlying structures of the problem.
Based on the cascaded gaps, we derive non-asymptotic and logarithmic regret
bounds for two model-free algorithms under episodic Markov decision processes.
We show that, in appropriate settings, these bounds feature exponential
improvement over existing ones that are independent of gaps. We also prove
gap-dependent lower bounds, which certify the near optimality of the upper
bounds.
Related papers
- Model-Based Epistemic Variance of Values for Risk-Aware Policy
Optimization [63.32053223422317]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
In particular, we focus on characterizing the variance over values induced by a distribution over MDPs.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Regret Bounds for Markov Decision Processes with Recursive Optimized
Certainty Equivalents [3.8980564330208662]
We propose a new episodic risk-sensitive reinforcement learning formulation.
We design an efficient learning algorithm for this problem based on value iteration and upper confidence bound.
Our bounds show that the regret rate achieved by our proposed algorithm has optimal dependence on the number of episodes and the number of actions.
arXiv Detail & Related papers (2023-01-30T01:22:31Z) - Tight Guarantees for Interactive Decision Making with the
Decision-Estimation Coefficient [51.37720227675476]
We introduce a new variant of the Decision-Estimation Coefficient, and use it to derive new lower bounds that improve upon prior work on three fronts.
We provide upper bounds on regret that scale with the same quantity, thereby closing all but one of the gaps between upper and lower bounds in Foster et al.
Our results apply to both the regret framework and PAC framework, and make use of several new analysis and algorithm design techniques that we anticipate will find broader use.
arXiv Detail & Related papers (2023-01-19T18:24:08Z) - On the Complexity of Adversarial Decision Making [101.14158787665252]
We show that the Decision-Estimation Coefficient is necessary and sufficient to obtain low regret for adversarial decision making.
We provide new structural results that connect the Decision-Estimation Coefficient to variants of other well-known complexity measures.
arXiv Detail & Related papers (2022-06-27T06:20:37Z) - A Unifying Theory of Thompson Sampling for Continuous Risk-Averse
Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem.
Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals.
We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z) - Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds
for Episodic Reinforcement Learning [50.44564503645015]
We provide improved gap-dependent regret bounds for reinforcement learning in finite episodic Markov decision processes.
We prove tighter upper regret bounds for optimistic algorithms and accompany them with new information-theoretic lower bounds for a large class of MDPs.
arXiv Detail & Related papers (2021-07-02T20:36:05Z) - Expanding boundaries of Gap Safe screening [0.0]
A powerful strategy to boost the performance of algorithms is known as safe screening.
We extend the existing Gap Safe screening framework by relaxing the global strong-concavity assumption on the dual cost function.
The proposed general framework is exemplified by some notable particular cases: logistic function, beta = 1.5 and Kullback-Leibler divergences.
arXiv Detail & Related papers (2021-02-22T09:23:31Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z) - Cautious Reinforcement Learning via Distributional Risk in the Dual
Domain [45.17200683056563]
We study the estimation of risk-sensitive policies in reinforcement learning problems defined by a Markov Decision Process (MDPs) whose state and action spaces are countably finite.
We propose a new definition of risk, which we call caution, as a penalty function added to the dual objective of the linear programming (LP) formulation of reinforcement learning.
arXiv Detail & Related papers (2020-02-27T23:18:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.