Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning
- URL: http://arxiv.org/abs/2004.10888v6
- Date: Thu, 7 Apr 2022 03:58:15 GMT
- Title: Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning
- Authors: Shangtong Zhang, Bo Liu, Shimon Whiteson
- Abstract summary: We present a framework for risk-averse control in a discounted infinite horizon MDP.
MVPI enjoys great flexibility in that any policy evaluation method and risk-neutral control method can be dropped in for risk-averse control off the shelf.
This flexibility reduces the gap between risk-neutral control and risk-averse control and is achieved by working on a novel augmented MDP.
- Score: 75.17074235764757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a mean-variance policy iteration (MVPI) framework for risk-averse
control in a discounted infinite horizon MDP optimizing the variance of a
per-step reward random variable. MVPI enjoys great flexibility in that any
policy evaluation method and risk-neutral control method can be dropped in for
risk-averse control off the shelf, in both on- and off-policy settings. This
flexibility reduces the gap between risk-neutral control and risk-averse
control and is achieved by working on a novel augmented MDP directly. We
propose risk-averse TD3 as an example instantiating MVPI, which outperforms
vanilla TD3 and many previous risk-averse control methods in challenging Mujoco
robot simulation tasks under a risk-aware performance metric. This risk-averse
TD3 is the first to introduce deterministic policies and off-policy learning
into risk-averse reinforcement learning, both of which are key to the
performance boost we show in Mujoco domains.
Related papers
- Stationary Policies are Optimal in Risk-averse Total-reward MDPs with EVaR [12.719528972742394]
We show that the risk-averse em total reward criterion can be optimized by a stationary policy.
Our results indicate that the total reward criterion may be preferable to the discounted criterion in a broad range of risk-averse reinforcement learning domains.
arXiv Detail & Related papers (2024-08-30T13:33:18Z) - Data-Adaptive Tradeoffs among Multiple Risks in Distribution-Free Prediction [55.77015419028725]
We develop methods that permit valid control of risk when threshold and tradeoff parameters are chosen adaptively.
Our methodology supports monotone and nearly-monotone risks, but otherwise makes no distributional assumptions.
arXiv Detail & Related papers (2024-03-28T17:28:06Z) - Risk-Averse Reinforcement Learning via Dynamic Time-Consistent Risk
Measures [10.221369785560785]
In this paper, we consider the problem of maximizing dynamic risk of a sequence of rewards in Markov Decision Processes (MDPs)
Using a convex combination of expectation and conditional value-at-risk (CVaR) as a special one-step conditional risk measure, we reformulate the risk-averse MDP as a risk-neutral counterpart with augmented action space and manipulation on the immediate rewards.
Our numerical studies show that the risk-averse setting can reduce the variance and enhance robustness of the results.
arXiv Detail & Related papers (2023-01-14T21:43:18Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement
Learning [12.022303947412917]
This paper aims at optimizing the mean-semivariance criterion in reinforcement learning w.r.t. steady rewards.
We reveal that the MSV problem can be solved by iteratively solving a sequence of RL problems with a policy-dependent reward function.
We propose two on-policy algorithms based on the policy gradient theory and the trust region method.
arXiv Detail & Related papers (2022-06-15T08:32:53Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Risk-Aware Transfer in Reinforcement Learning using Successor Features [16.328601804662657]
We show that risk-aware successor features (RaSF) integrate seamlessly within the practical reinforcement learning framework.
RaSFs outperform alternative methods including SFs, when taking the risk of the learned policies into account.
arXiv Detail & Related papers (2021-05-28T22:22:03Z) - Ultra-Reliable Indoor Millimeter Wave Communications using Multiple
Artificial Intelligence-Powered Intelligent Surfaces [115.85072043481414]
We propose a novel framework for guaranteeing ultra-reliable millimeter wave (mmW) communications using multiple artificial intelligence (AI)-enabled reconfigurable intelligent surfaces (RISs)
The use of multiple AI-powered RISs allows changing the propagation direction of the signals transmitted from a mmW access point (AP)
Two centralized and distributed controllers are proposed to control the policies of the mmW AP and RISs.
arXiv Detail & Related papers (2021-03-31T19:15:49Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.