Measures of Variability for Risk-averse Policy Gradient
- URL: http://arxiv.org/abs/2504.11412v1
- Date: Tue, 15 Apr 2025 17:28:15 GMT
- Title: Measures of Variability for Risk-averse Policy Gradient
- Authors: Yudong Luo, Yangchen Pan, Jiaqi Tan, Pascal Poupart,
- Abstract summary: We study nine common measures of variability in risk-averse reinforcement learning (RARL)<n>Among them, four metrics have not been previously studied in RARL.<n>Our empirical study reveals that variance-based metrics lead to unstable policy updates.
- Score: 26.293988407069293
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Risk-averse reinforcement learning (RARL) is critical for decision-making under uncertainty, which is especially valuable in high-stake applications. However, most existing works focus on risk measures, e.g., conditional value-at-risk (CVaR), while measures of variability remain underexplored. In this paper, we comprehensively study nine common measures of variability, namely Variance, Gini Deviation, Mean Deviation, Mean-Median Deviation, Standard Deviation, Inter-Quantile Range, CVaR Deviation, Semi_Variance, and Semi_Standard Deviation. Among them, four metrics have not been previously studied in RARL. We derive policy gradient formulas for these unstudied metrics, improve gradient estimation for Gini Deviation, analyze their gradient properties, and incorporate them with the REINFORCE and PPO frameworks to penalize the dispersion of returns. Our empirical study reveals that variance-based metrics lead to unstable policy updates. In contrast, CVaR Deviation and Gini Deviation show consistent performance across different randomness and evaluation domains, achieving high returns while effectively learning risk-averse policies. Mean Deviation and Semi_Standard Deviation are also competitive across different scenarios. This work provides a comprehensive overview of variability measures in RARL, offering practical insights for risk-aware decision-making and guiding future research on risk metrics and RARL algorithms.
Related papers
- Beyond CVaR: Leveraging Static Spectral Risk Measures for Enhanced Decision-Making in Distributional Reinforcement Learning [4.8342038441006805]
In domains such as finance, healthcare, and robotics, managing worst-case scenarios is critical.<n> Distributional Reinforcement Learning (DRL) provides a natural framework to incorporate risk sensitivity into decision-making processes.<n>We present a novel DRL algorithm with convergence guarantees that optimize for a broader class of static Spectral Risk Measures (SRM)
arXiv Detail & Related papers (2025-01-03T20:25:41Z) - Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence [15.720824593964027]
This paper introduces a new policy gradient method for risk-sensitive DRL with general coherent risk measures.
For practical use, we design a categorical distributional policy gradient algorithm (GCDP) that approximates any distribution by a categorical family supported on some fixed points.
arXiv Detail & Related papers (2024-05-23T16:16:58Z) - Provable Risk-Sensitive Distributional Reinforcement Learning with
General Function Approximation [54.61816424792866]
We introduce a general framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL), with static Lipschitz Risk Measures (LRM) and general function approximation.
We design two innovative meta-algorithms: textttRS-DisRL-M, a model-based strategy for model-based function approximation, and textttRS-DisRL-V, a model-free approach for general value function approximation.
arXiv Detail & Related papers (2024-02-28T08:43:18Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - An Alternative to Variance: Gini Deviation for Risk-averse Policy
Gradient [35.01235012813407]
Restricting the variance of a policy's return is a popular choice in risk-averse Reinforcement Learning.
Recent methods restrict the per-step reward variance as a proxy.
We propose to use an alternative risk measure, Gini deviation, as a substitute.
arXiv Detail & Related papers (2023-07-17T22:08:27Z) - Safe Deployment for Counterfactual Learning to Rank with Exposure-Based
Risk Minimization [63.93275508300137]
We introduce a novel risk-aware Counterfactual Learning To Rank method with theoretical guarantees for safe deployment.
Our experimental results demonstrate the efficacy of our proposed method, which is effective at avoiding initial periods of bad performance when little data is available.
arXiv Detail & Related papers (2023-04-26T15:54:23Z) - Off-Policy Risk Assessment in Markov Decision Processes [15.225153671736201]
We develop the first doubly robust (DR) estimator for the CDF of returns in Markov decision processes (MDPs)
This estimator enjoys significantly less variance and, when the model is well specified, achieves the Cramer-Rao variance lower bound.
We derive the first minimax lower bounds for off-policy CDF and risk estimation, which match our error bounds up to a constant factor.
arXiv Detail & Related papers (2022-09-21T15:40:59Z) - Off-Policy Evaluation of Slate Policies under Bayes Risk [70.10677881866047]
We study the problem of off-policy evaluation for slate bandits, for the typical case in which the logging policy factorizes over the slots of the slate.
We show that the risk improvement over PI grows linearly with the number of slots, and linearly with the gap between the arithmetic and the harmonic mean of a set of slot-level divergences.
arXiv Detail & Related papers (2021-01-05T20:07:56Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - Learning Bounds for Risk-sensitive Learning [86.50262971918276]
In risk-sensitive learning, one aims to find a hypothesis that minimizes a risk-averse (or risk-seeking) measure of loss.
We study the generalization properties of risk-sensitive learning schemes whose optimand is described via optimized certainty equivalents.
arXiv Detail & Related papers (2020-06-15T05:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.