Risk-Sensitive Policy with Distributional Reinforcement Learning
- URL: http://arxiv.org/abs/2212.14743v1
- Date: Fri, 30 Dec 2022 14:37:28 GMT
- Title: Risk-Sensitive Policy with Distributional Reinforcement Learning
- Authors: Thibaut Th\'eate and Damien Ernst
- Abstract summary: This research work introduces a novel methodology based on distributional RL to derive sequential decision-making policies sensitive to the risk.
Named the risk-based utility function $U$, it can be extracted from the random return distribution $Z$ naturally learnt by any distributional RL algorithm.
This enables to span the complete potential trade-off between risk minimisation and expected return maximisation.
- Score: 4.523089386111081
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classical reinforcement learning (RL) techniques are generally concerned with
the design of decision-making policies driven by the maximisation of the
expected outcome. Nevertheless, this approach does not take into consideration
the potential risk associated with the actions taken, which may be critical in
certain applications. To address that issue, the present research work
introduces a novel methodology based on distributional RL to derive sequential
decision-making policies that are sensitive to the risk, the latter being
modelled by the tail of the return probability distribution. The core idea is
to replace the $Q$ function generally standing at the core of learning schemes
in RL by another function taking into account both the expected return and the
risk. Named the risk-based utility function $U$, it can be extracted from the
random return distribution $Z$ naturally learnt by any distributional RL
algorithm. This enables to span the complete potential trade-off between risk
minimisation and expected return maximisation, in contrast to fully risk-averse
methodologies. Fundamentally, this research yields a truly practical and
accessible solution for learning risk-sensitive policies with minimal
modification to the distributional RL algorithm, and with an emphasis on the
interpretability of the resulting decision-making process.
Related papers
- Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence [15.720824593964027]
Risk-sensitive reinforcement learning (RL) is crucial for maintaining reliable performance in high-stakes applications.
This paper introduces a policy gradient method for risk-sensitive DRL with general coherent risk measures.
We also design a categorical distributional policy gradient algorithm (CDPG) based on categorical distributional policy evaluation and trajectory gradient estimation.
arXiv Detail & Related papers (2024-05-23T16:16:58Z) - Uncertainty-aware Distributional Offline Reinforcement Learning [26.34178581703107]
offline reinforcement learning (RL) presents distinct challenges as it relies solely on observational data.
We propose an uncertainty-aware distributional offline RL method to simultaneously address both uncertainty and environmentality.
Our method is rigorously evaluated through comprehensive experiments in both risk-sensitive and risk-neutral benchmarks, demonstrating its superior performance.
arXiv Detail & Related papers (2024-03-26T12:28:04Z) - Provable Risk-Sensitive Distributional Reinforcement Learning with
General Function Approximation [54.61816424792866]
We introduce a general framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL), with static Lipschitz Risk Measures (LRM) and general function approximation.
We design two innovative meta-algorithms: textttRS-DisRL-M, a model-based strategy for model-based function approximation, and textttRS-DisRL-V, a model-free approach for general value function approximation.
arXiv Detail & Related papers (2024-02-28T08:43:18Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Risk-sensitive Markov Decision Process and Learning under General
Utility Functions [3.6260136172126667]
Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations.
We propose a modified value algorithm that employs an epsilon-covering over the space of cumulative reward.
In the absence of a simulator, our algorithm, designed with an upper-confidence-bound exploration approach, identifies a near-optimal policy.
arXiv Detail & Related papers (2023-11-22T18:50:06Z) - Provably Efficient Iterated CVaR Reinforcement Learning with Function
Approximation and Human Feedback [57.6775169085215]
Risk-sensitive reinforcement learning aims to optimize policies that balance the expected reward and risk.
We present a novel framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations.
We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis.
arXiv Detail & Related papers (2023-07-06T08:14:54Z) - Is Risk-Sensitive Reinforcement Learning Properly Resolved? [32.42976780682353]
We propose a novel algorithm, namely Trajectory Q-Learning (TQL), for RSRL problems with provable convergence to the optimal policy.
Based on our new learning architecture, we are free to introduce a general and practical implementation for different risk measures to learn disparate risk-sensitive policies.
arXiv Detail & Related papers (2023-07-02T11:47:21Z) - Policy Evaluation in Distributional LQR [70.63903506291383]
We provide a closed-form expression of the distribution of the random return.
We show that this distribution can be approximated by a finite number of random variables.
Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR.
arXiv Detail & Related papers (2023-03-23T20:27:40Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.