Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2407.07631v1
- Date: Wed, 10 Jul 2024 13:09:52 GMT
- Title: Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
- Authors: Dake Zhang, Boxiang Lyu, Shuang Qiu, Mladen Kolar, Tong Zhang,
- Abstract summary: We study risk-sensitive reinforcement learning (RL), a crucial field due to its ability to enhance decision-making in scenarios where it is essential to manage uncertainty and minimize potential adverse outcomes.
Our work focuses on applying the entropic risk measure to RL problems.
We center on the linear Markov Decision Process (MDP) setting, a well-regarded theoretical framework that has yet to be examined from a risk-sensitive standpoint.
- Score: 19.292214425524303
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study risk-sensitive reinforcement learning (RL), a crucial field due to its ability to enhance decision-making in scenarios where it is essential to manage uncertainty and minimize potential adverse outcomes. Particularly, our work focuses on applying the entropic risk measure to RL problems. While existing literature primarily investigates the online setting, there remains a large gap in understanding how to efficiently derive a near-optimal policy based on this risk measure using only a pre-collected dataset. We center on the linear Markov Decision Process (MDP) setting, a well-regarded theoretical framework that has yet to be examined from a risk-sensitive standpoint. In response, we introduce two provably sample-efficient algorithms. We begin by presenting a risk-sensitive pessimistic value iteration algorithm, offering a tight analysis by leveraging the structure of the risk-sensitive performance measure. To further improve the obtained bounds, we propose another pessimistic algorithm that utilizes variance information and reference-advantage decomposition, effectively improving both the dependence on the space dimension $d$ and the risk-sensitivity factor. To the best of our knowledge, we obtain the first provably efficient risk-sensitive offline RL algorithms.
Related papers
- Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees [13.470544618339506]
We propose a spectral risk measure-constrained RL algorithm, spectral-risk-constrained policy optimization (SRCPO)
In the bilevel optimization structure, the outer problem involves optimizing dual variables derived from the risk measures, while the inner problem involves finding an optimal policy.
The proposed method has been evaluated on continuous control tasks and showed the best performance among other RCRL algorithms satisfying the constraints.
arXiv Detail & Related papers (2024-05-29T02:17:25Z) - Robust Risk-Sensitive Reinforcement Learning with Conditional Value-at-Risk [23.63388546004777]
We analyze the robustness of CVaR-based risk-sensitive RL under Robust Markov Decision Processes.
Motivated by the existence of decision-dependent uncertainty in real-world problems, we study problems with state-action-dependent ambiguity sets.
arXiv Detail & Related papers (2024-05-02T20:28:49Z) - Risk-Sensitive RL with Optimized Certainty Equivalents via Reduction to
Standard RL [48.1726560631463]
We study Risk-Sensitive Reinforcement Learning with the Optimized Certainty Equivalent (OCE) risk.
We propose two general meta-algorithms via reductions to standard RL.
We show that it learns the optimal risk-sensitive policy while prior algorithms provably fail.
arXiv Detail & Related papers (2024-03-10T21:45:12Z) - Provable Risk-Sensitive Distributional Reinforcement Learning with
General Function Approximation [54.61816424792866]
We introduce a general framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL), with static Lipschitz Risk Measures (LRM) and general function approximation.
We design two innovative meta-algorithms: textttRS-DisRL-M, a model-based strategy for model-based function approximation, and textttRS-DisRL-V, a model-free approach for general value function approximation.
arXiv Detail & Related papers (2024-02-28T08:43:18Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Provably Efficient Iterated CVaR Reinforcement Learning with Function
Approximation and Human Feedback [57.6775169085215]
Risk-sensitive reinforcement learning aims to optimize policies that balance the expected reward and risk.
We present a novel framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations.
We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis.
arXiv Detail & Related papers (2023-07-06T08:14:54Z) - One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based
Offline Reinforcement Learning [25.218430053391884]
We propose risk-sensitivity as a mechanism to jointly address both of these issues.
Risk-aversion to aleatoric uncertainty discourages actions that may result in poor outcomes due to environmentity.
Our experiments show that our algorithm achieves competitive performance on deterministic benchmarks.
arXiv Detail & Related papers (2022-11-30T21:24:11Z) - Risk-Aware Linear Bandits: Theory and Applications in Smart Order
Routing [10.69955834942979]
We consider risk-aware bandits optimization with applications in smart order routing (SOR)
Driven by the variance-minimizing globally-optimal (G-optimal) design, we propose the novel instance-independent Risk-Aware Explore-then-Commit (RISE) algorithm and the instance-dependent Risk-Aware Successive Elimination (RISE++) algorithm.
arXiv Detail & Related papers (2022-08-04T00:21:10Z) - Deep Learning for Systemic Risk Measures [3.274367403737527]
The aim of this paper is to study a new methodological framework for systemic risk measures.
Under this new framework, systemic risk measures can be interpreted as the minimal amount of cash that secures the aggregated system.
Deep learning is increasingly receiving attention in financial modelings and risk management.
arXiv Detail & Related papers (2022-07-02T05:01:19Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Learning Bounds for Risk-sensitive Learning [86.50262971918276]
In risk-sensitive learning, one aims to find a hypothesis that minimizes a risk-averse (or risk-seeking) measure of loss.
We study the generalization properties of risk-sensitive learning schemes whose optimand is described via optimized certainty equivalents.
arXiv Detail & Related papers (2020-06-15T05:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.