An Alternative to Variance: Gini Deviation for Risk-averse Policy
Gradient
- URL: http://arxiv.org/abs/2307.08873v3
- Date: Thu, 2 Nov 2023 21:01:04 GMT
- Title: An Alternative to Variance: Gini Deviation for Risk-averse Policy
Gradient
- Authors: Yudong Luo, Guiliang Liu, Pascal Poupart, Yangchen Pan
- Abstract summary: Restricting the variance of a policy's return is a popular choice in risk-averse Reinforcement Learning.
Recent methods restrict the per-step reward variance as a proxy.
We propose to use an alternative risk measure, Gini deviation, as a substitute.
- Score: 35.01235012813407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Restricting the variance of a policy's return is a popular choice in
risk-averse Reinforcement Learning (RL) due to its clear mathematical
definition and easy interpretability. Traditional methods directly restrict the
total return variance. Recent methods restrict the per-step reward variance as
a proxy. We thoroughly examine the limitations of these variance-based methods,
such as sensitivity to numerical scale and hindering of policy learning, and
propose to use an alternative risk measure, Gini deviation, as a substitute. We
study various properties of this new risk measure and derive a policy gradient
algorithm to minimize it. Empirical evaluation in domains where risk-aversion
can be clearly defined, shows that our algorithm can mitigate the limitations
of variance-based risk measures and achieves high return with low risk in terms
of variance and Gini deviation when others fail to learn a reasonable policy.
Related papers
- Data-Adaptive Tradeoffs among Multiple Risks in Distribution-Free Prediction [55.77015419028725]
We develop methods that permit valid control of risk when threshold and tradeoff parameters are chosen adaptively.
Our methodology supports monotone and nearly-monotone risks, but otherwise makes no distributional assumptions.
arXiv Detail & Related papers (2024-03-28T17:28:06Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - A Risk-Sensitive Approach to Policy Optimization [21.684251937825234]
Standard deep reinforcement learning (DRL) aims to maximize expected reward, considering collected experiences equally in formulating a policy.
We propose a more direct approach whereby risk-sensitive objectives, specified in terms of the cumulative distribution function (CDF) of the distribution of full-episode rewards, are optimized.
We demonstrate that the use of moderately "pessimistic" risk profiles, which emphasize scenarios where the agent performs poorly, leads to enhanced exploration and a continual focus on addressing deficiencies.
arXiv Detail & Related papers (2022-08-19T00:55:05Z) - Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement
Learning [12.022303947412917]
This paper aims at optimizing the mean-semivariance criterion in reinforcement learning w.r.t. steady rewards.
We reveal that the MSV problem can be solved by iteratively solving a sequence of RL problems with a policy-dependent reward function.
We propose two on-policy algorithms based on the policy gradient theory and the trust region method.
arXiv Detail & Related papers (2022-06-15T08:32:53Z) - Variance Penalized On-Policy and Off-Policy Actor-Critic [60.06593931848165]
We propose on-policy and off-policy actor-critic algorithms that optimize a performance criterion involving both mean and variance in the return.
Our approach not only performs on par with actor-critic and prior variance-penalization baselines in terms of expected return, but also generates trajectories which have lower variance in the return.
arXiv Detail & Related papers (2021-02-03T10:06:16Z) - Off-Policy Evaluation of Slate Policies under Bayes Risk [70.10677881866047]
We study the problem of off-policy evaluation for slate bandits, for the typical case in which the logging policy factorizes over the slots of the slate.
We show that the risk improvement over PI grows linearly with the number of slots, and linearly with the gap between the arithmetic and the harmonic mean of a set of slot-level divergences.
arXiv Detail & Related papers (2021-01-05T20:07:56Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - A Natural Actor-Critic Algorithm with Downside Risk Constraints [5.482532589225552]
We introduce a new Bellman equation that upper bounds the lower partial moment, circumventing its non-linearity.
We prove that this proxy for the lower partial moment is a contraction, and provide intuition into the stability of the algorithm by variance decomposition.
We extend the method to use natural policy gradients and demonstrate the effectiveness of our approach on three benchmark problems for risk-sensitive reinforcement learning.
arXiv Detail & Related papers (2020-07-08T15:44:33Z) - Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning [75.17074235764757]
We present a framework for risk-averse control in a discounted infinite horizon MDP.
MVPI enjoys great flexibility in that any policy evaluation method and risk-neutral control method can be dropped in for risk-averse control off the shelf.
This flexibility reduces the gap between risk-neutral control and risk-averse control and is achieved by working on a novel augmented MDP.
arXiv Detail & Related papers (2020-04-22T22:23:44Z) - Cautious Reinforcement Learning via Distributional Risk in the Dual
Domain [45.17200683056563]
We study the estimation of risk-sensitive policies in reinforcement learning problems defined by a Markov Decision Process (MDPs) whose state and action spaces are countably finite.
We propose a new definition of risk, which we call caution, as a penalty function added to the dual objective of the linear programming (LP) formulation of reinforcement learning.
arXiv Detail & Related papers (2020-02-27T23:18:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.