Related papers: Risk-Sensitive Reinforcement Learning: a Martingale Approach to Reward Uncertainty

Risk-Sensitive Reinforcement Learning: a Martingale Approach to Reward Uncertainty

URL: http://arxiv.org/abs/2006.12686v2
Date: Tue, 15 Sep 2020 13:50:26 GMT
Title: Risk-Sensitive Reinforcement Learning: a Martingale Approach to Reward Uncertainty
Authors: Nelson Vadori and Sumitra Ganesh and Prashant Reddy and Manuela Veloso
Abstract summary: We introduce a novel framework to account for sensitivity to rewards uncertainty in sequential decision-making problems. We present a new decomposition of the randomness contained in the cumulative reward based on the Doob decomposition of a process. We innovate on the reinforcement learning side by incorporating this new risk-sensitive approach into model-free algorithms, both policy and value gradient function based, and illustrate its relevance on grid world and portfolio optimization problems.
Score: 15.572157454411533
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a novel framework to account for sensitivity to rewards uncertainty in sequential decision-making problems. While risk-sensitive formulations for Markov decision processes studied so far focus on the distribution of the cumulative reward as a whole, we aim at learning policies sensitive to the uncertain/stochastic nature of the rewards, which has the advantage of being conceptually more meaningful in some cases. To this end, we present a new decomposition of the randomness contained in the cumulative reward based on the Doob decomposition of a stochastic process, and introduce a new conceptual tool - the \textit{chaotic variation} - which can rigorously be interpreted as the risk measure of the martingale component associated to the cumulative reward process. We innovate on the reinforcement learning side by incorporating this new risk-sensitive approach into model-free algorithms, both policy gradient and value function based, and illustrate its relevance on grid world and portfolio optimization problems.

Related papers

Diffusion Policies for Risk-Averse Behavior Modeling in Offline Reinforcement Learning [26.34178581703107]
offline reinforcement learning (RL) presents distinct challenges as it relies solely on observational data.<n>We propose an uncertainty-aware distributional offline RL method to simultaneously address both uncertainty and environmentality.<n>Our method is rigorously evaluated through comprehensive experiments in both risk-sensitive and risk-neutral benchmarks, demonstrating its superior performance.
arXiv Detail & Related papers (2024-03-26T12:28:04Z)
Risk-Sensitive Stochastic Optimal Control as Rao-Blackwellized Markovian Score Climbing [3.9410617513331863]
optimal control of dynamical systems is a crucial challenge in sequential decision-making. Control-as-inference approaches have had considerable success, providing a viable risk-sensitive framework to address the exploration-exploitation dilemma. This paper introduces a novel perspective by framing risk-sensitive control as Markovian reinforcement score climbing under samples drawn from a conditional particle filter.
arXiv Detail & Related papers (2023-12-21T16:34:03Z)
Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values. We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z)
Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents [3.8980564330208662]
We propose a new episodic risk-sensitive reinforcement learning formulation. We design an efficient learning algorithm for this problem based on value iteration and upper confidence bound. Our bounds show that the regret rate achieved by our proposed algorithm has optimal dependence on the number of episodes and the number of actions.
arXiv Detail & Related papers (2023-01-30T01:22:31Z)
Deep Learning for Systemic Risk Measures [3.274367403737527]
The aim of this paper is to study a new methodological framework for systemic risk measures. Under this new framework, systemic risk measures can be interpreted as the minimal amount of cash that secures the aggregated system. Deep learning is increasingly receiving attention in financial modelings and risk management.
arXiv Detail & Related papers (2022-07-02T05:01:19Z)
On the Complexity of Adversarial Decision Making [101.14158787665252]
We show that the Decision-Estimation Coefficient is necessary and sufficient to obtain low regret for adversarial decision making. We provide new structural results that connect the Decision-Estimation Coefficient to variants of other well-known complexity measures.
arXiv Detail & Related papers (2022-06-27T06:20:37Z)
Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk. Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z)
A Regret Minimization Approach to Iterative Learning Control [61.37088759497583]
We propose a new performance metric, planning regret, which replaces the standard uncertainty assumptions with worst case regret. We provide theoretical and empirical evidence that the proposed algorithm outperforms existing methods on several benchmarks.
arXiv Detail & Related papers (2021-02-26T13:48:49Z)
Learning Bounds for Risk-sensitive Learning [86.50262971918276]
In risk-sensitive learning, one aims to find a hypothesis that minimizes a risk-averse (or risk-seeking) measure of loss. We study the generalization properties of risk-sensitive learning schemes whose optimand is described via optimized certainty equivalents.
arXiv Detail & Related papers (2020-06-15T05:25:02Z)
Cautious Reinforcement Learning via Distributional Risk in the Dual Domain [45.17200683056563]
We study the estimation of risk-sensitive policies in reinforcement learning problems defined by a Markov Decision Process (MDPs) whose state and action spaces are countably finite. We propose a new definition of risk, which we call caution, as a penalty function added to the dual objective of the linear programming (LP) formulation of reinforcement learning.
arXiv Detail & Related papers (2020-02-27T23:18:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.