Risk-Sensitive RL with Optimized Certainty Equivalents via Reduction to
Standard RL
- URL: http://arxiv.org/abs/2403.06323v1
- Date: Sun, 10 Mar 2024 21:45:12 GMT
- Title: Risk-Sensitive RL with Optimized Certainty Equivalents via Reduction to
Standard RL
- Authors: Kaiwen Wang, Dawen Liang, Nathan Kallus, Wen Sun
- Abstract summary: We study Risk-Sensitive Reinforcement Learning with the Optimized Certainty Equivalent (OCE) risk.
We propose two general meta-algorithms via reductions to standard RL.
We show that it learns the optimal risk-sensitive policy while prior algorithms provably fail.
- Score: 48.1726560631463
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study Risk-Sensitive Reinforcement Learning (RSRL) with the Optimized
Certainty Equivalent (OCE) risk, which generalizes Conditional Value-at-risk
(CVaR), entropic risk and Markowitz's mean-variance. Using an augmented Markov
Decision Process (MDP), we propose two general meta-algorithms via reductions
to standard RL: one based on optimistic algorithms and another based on policy
optimization. Our optimistic meta-algorithm generalizes almost all prior RSRL
theory with entropic risk or CVaR. Under discrete rewards, our optimistic
theory also certifies the first RSRL regret bounds for MDPs with bounded
coverability, e.g., exogenous block MDPs. Under discrete rewards, our policy
optimization meta-algorithm enjoys both global convergence and local
improvement guarantees in a novel metric that lower bounds the true OCE risk.
Finally, we instantiate our framework with PPO, construct an MDP, and show that
it learns the optimal risk-sensitive policy while prior algorithms provably
fail.
Related papers
- Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes [7.028778922533688]
Average-reward Markov decision processes (MDPs) provide a foundational framework for sequential decision-making under uncertainty.
We study a unique structural property of average-reward MDPs and utilize it to introduce Reward-Extended Differential (or RED) reinforcement learning.
arXiv Detail & Related papers (2024-10-14T14:52:23Z) - Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning [19.292214425524303]
We study risk-sensitive reinforcement learning (RL), a crucial field due to its ability to enhance decision-making in scenarios where it is essential to manage uncertainty and minimize potential adverse outcomes.
Our work focuses on applying the entropic risk measure to RL problems.
We center on the linear Markov Decision Process (MDP) setting, a well-regarded theoretical framework that has yet to be examined from a risk-sensitive standpoint.
arXiv Detail & Related papers (2024-07-10T13:09:52Z) - Provable Risk-Sensitive Distributional Reinforcement Learning with
General Function Approximation [54.61816424792866]
We introduce a general framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL), with static Lipschitz Risk Measures (LRM) and general function approximation.
We design two innovative meta-algorithms: textttRS-DisRL-M, a model-based strategy for model-based function approximation, and textttRS-DisRL-V, a model-free approach for general value function approximation.
arXiv Detail & Related papers (2024-02-28T08:43:18Z) - Risk-sensitive Markov Decision Process and Learning under General
Utility Functions [3.6260136172126667]
Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations.
We propose a modified value algorithm that employs an epsilon-covering over the space of cumulative reward.
In the absence of a simulator, our algorithm, designed with an upper-confidence-bound exploration approach, identifies a near-optimal policy.
arXiv Detail & Related papers (2023-11-22T18:50:06Z) - Provably Efficient Iterated CVaR Reinforcement Learning with Function
Approximation and Human Feedback [57.6775169085215]
Risk-sensitive reinforcement learning aims to optimize policies that balance the expected reward and risk.
We present a novel framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations.
We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis.
arXiv Detail & Related papers (2023-07-06T08:14:54Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Robustness and risk management via distributional dynamic programming [13.173307471333619]
We introduce a new class of distributional operators, together with a practical DP algorithm for policy evaluation.
Our approach reformulates through an augmented state space where each state is split into a worst-case substate and a best-case substate.
We derive distributional operators and DP algorithms solving a new control task.
arXiv Detail & Related papers (2021-12-28T12:12:57Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - CRPO: A New Approach for Safe Reinforcement Learning with Convergence
Guarantee [61.176159046544946]
In safe reinforcement learning (SRL) problems, an agent explores the environment to maximize an expected total reward and avoids violation of certain constraints.
This is the first-time analysis of SRL algorithms with global optimal policies.
arXiv Detail & Related papers (2020-11-11T16:05:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.