Risk-sensitive Markov Decision Process and Learning under General
Utility Functions
- URL: http://arxiv.org/abs/2311.13589v1
- Date: Wed, 22 Nov 2023 18:50:06 GMT
- Title: Risk-sensitive Markov Decision Process and Learning under General
Utility Functions
- Authors: Zhengqi Wu and Renyuan Xu
- Abstract summary: Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations.
We propose a modified value algorithm that employs an epsilon-covering over the space of cumulative reward.
In the absence of a simulator, our algorithm, designed with an upper-confidence-bound exploration approach, identifies a near-optimal policy.
- Score: 3.6260136172126667
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Reinforcement Learning (RL) has gained substantial attention across diverse
application domains and theoretical investigations. Existing literature on RL
theory largely focuses on risk-neutral settings where the decision-maker learns
to maximize the expected cumulative reward. However, in practical scenarios
such as portfolio management and e-commerce recommendations, decision-makers
often persist in heterogeneous risk preferences subject to outcome
uncertainties, which can not be well-captured by the risk-neural framework.
Incorporating these preferences can be approached through utility theory, yet
the development of risk-sensitive RL under general utility functions remains an
open question for theoretical exploration.
In this paper, we consider a scenario where the decision-maker seeks to
optimize a general utility function of the cumulative reward in the framework
of a Markov decision process (MDP). To facilitate the Dynamic Programming
Principle and Bellman equation, we enlarge the state space with an additional
dimension that accounts for the cumulative reward. We propose a discretized
approximation scheme to the MDP under enlarged state space, which is tractable
and key for algorithmic design. We then propose a modified value iteration
algorithm that employs an epsilon-covering over the space of cumulative reward.
When a simulator is accessible, our algorithm efficiently learns a near-optimal
policy with guaranteed sample complexity. In the absence of a simulator, our
algorithm, designed with an upper-confidence-bound exploration approach,
identifies a near-optimal policy while ensuring a guaranteed regret bound. For
both algorithms, we match the theoretical lower bounds for the risk-neutral
setting.
Related papers
- Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning [19.292214425524303]
We study risk-sensitive reinforcement learning (RL), a crucial field due to its ability to enhance decision-making in scenarios where it is essential to manage uncertainty and minimize potential adverse outcomes.
Our work focuses on applying the entropic risk measure to RL problems.
We center on the linear Markov Decision Process (MDP) setting, a well-regarded theoretical framework that has yet to be examined from a risk-sensitive standpoint.
arXiv Detail & Related papers (2024-07-10T13:09:52Z) - Risk-Sensitive RL with Optimized Certainty Equivalents via Reduction to
Standard RL [48.1726560631463]
We study Risk-Sensitive Reinforcement Learning with the Optimized Certainty Equivalent (OCE) risk.
We propose two general meta-algorithms via reductions to standard RL.
We show that it learns the optimal risk-sensitive policy while prior algorithms provably fail.
arXiv Detail & Related papers (2024-03-10T21:45:12Z) - Provable Risk-Sensitive Distributional Reinforcement Learning with
General Function Approximation [54.61816424792866]
We introduce a general framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL), with static Lipschitz Risk Measures (LRM) and general function approximation.
We design two innovative meta-algorithms: textttRS-DisRL-M, a model-based strategy for model-based function approximation, and textttRS-DisRL-V, a model-free approach for general value function approximation.
arXiv Detail & Related papers (2024-02-28T08:43:18Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Multivariate Systemic Risk Measures and Computation by Deep Learning
Algorithms [63.03966552670014]
We discuss the key related theoretical aspects, with a particular focus on the fairness properties of primal optima and associated risk allocations.
The algorithms we provide allow for learning primals, optima for the dual representation and corresponding fair risk allocations.
arXiv Detail & Related papers (2023-02-02T22:16:49Z) - Model-based Safe Deep Reinforcement Learning via a Constrained Proximal
Policy Optimization Algorithm [4.128216503196621]
We propose an On-policy Model-based Safe Deep RL algorithm in which we learn the transition dynamics of the environment in an online manner.
We show that our algorithm is more sample efficient and results in lower cumulative hazard violations as compared to constrained model-free approaches.
arXiv Detail & Related papers (2022-10-14T06:53:02Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z) - Reinforcement Learning with Algorithms from Probabilistic Structure
Estimation [9.37335587960084]
Reinforcement learning algorithms aim to learn optimal decisions in unknown environments.
It is unknown from the outset whether or not the agent's actions will impact the environment.
It is often not possible to determine which RL algorithm is most fitting.
arXiv Detail & Related papers (2021-03-15T09:51:34Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.