Risk-Aware Transfer in Reinforcement Learning using Successor Features
- URL: http://arxiv.org/abs/2105.14127v1
- Date: Fri, 28 May 2021 22:22:03 GMT
- Title: Risk-Aware Transfer in Reinforcement Learning using Successor Features
- Authors: Michael Gimelfarb, Andr\'e Barreto, Scott Sanner, Chi-Guhn Lee
- Abstract summary: We show that risk-aware successor features (RaSF) integrate seamlessly within the practical reinforcement learning framework.
RaSFs outperform alternative methods including SFs, when taking the risk of the learned policies into account.
- Score: 16.328601804662657
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sample efficiency and risk-awareness are central to the development of
practical reinforcement learning (RL) for complex decision-making. The former
can be addressed by transfer learning and the latter by optimizing some utility
function of the return. However, the problem of transferring skills in a
risk-aware manner is not well-understood. In this paper, we address the problem
of risk-aware policy transfer between tasks in a common domain that differ only
in their reward functions, in which risk is measured by the variance of reward
streams. Our approach begins by extending the idea of generalized policy
improvement to maximize entropic utilities, thus extending policy improvement
via dynamic programming to sets of policies and levels of risk-aversion. Next,
we extend the idea of successor features (SF), a value function representation
that decouples the environment dynamics from the rewards, to capture the
variance of returns. Our resulting risk-aware successor features (RaSF)
integrate seamlessly within the RL framework, inherit the superior task
generalization ability of SFs, and incorporate risk-awareness into the
decision-making. Experiments on a discrete navigation domain and control of a
simulated robotic arm demonstrate the ability of RaSFs to outperform
alternative methods including SFs, when taking the risk of the learned policies
into account.
Related papers
- CAT: Caution Aware Transfer in Reinforcement Learning via Distributional Risk [19.698719925388513]
We introduce a novel Caution-Aware Transfer Learning (CAT) framework.
Unlike traditional approaches, we define "caution" as a more generalized and comprehensive notion of risk.
Our core innovation lies in optimizing a weighted sum of reward return and caution-based on state-action occupancy measures-during the transfer process.
arXiv Detail & Related papers (2024-08-16T15:47:08Z) - RiskQ: Risk-sensitive Multi-Agent Reinforcement Learning Value Factorization [49.26510528455664]
We introduce the Risk-sensitive Individual-Global-Max (RIGM) principle as a generalization of the Individual-Global-Max (IGM) and Distributional IGM (DIGM) principles.
We show that RiskQ can obtain promising performance through extensive experiments.
arXiv Detail & Related papers (2023-11-03T07:18:36Z) - Is Risk-Sensitive Reinforcement Learning Properly Resolved? [32.42976780682353]
We propose a novel algorithm, namely Trajectory Q-Learning (TQL), for RSRL problems with provable convergence to the optimal policy.
Based on our new learning architecture, we are free to introduce a general and practical implementation for different risk measures to learn disparate risk-sensitive policies.
arXiv Detail & Related papers (2023-07-02T11:47:21Z) - Risk-Sensitive Policy with Distributional Reinforcement Learning [4.523089386111081]
This research work introduces a novel methodology based on distributional RL to derive sequential decision-making policies sensitive to the risk.
Named the risk-based utility function $U$, it can be extracted from the random return distribution $Z$ naturally learnt by any distributional RL algorithm.
This enables to span the complete potential trade-off between risk minimisation and expected return maximisation.
arXiv Detail & Related papers (2022-12-30T14:37:28Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Bayesian Robust Optimization for Imitation Learning [34.40385583372232]
Inverse reinforcement learning can enable generalization to new states by learning a parameterized reward function.
Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework.
BROIL provides a natural way to interpolate between return-maximizing and risk-minimizing behaviors.
arXiv Detail & Related papers (2020-07-24T01:52:11Z) - Improving Robustness via Risk Averse Distributional Reinforcement
Learning [13.467017642143581]
Robustness is critical when the policies are trained in simulations instead of real world environment.
We propose a risk-aware algorithm to learn robust policies in order to bridge the gap between simulation training and real-world implementation.
arXiv Detail & Related papers (2020-05-01T20:03:10Z) - Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning [75.17074235764757]
We present a framework for risk-averse control in a discounted infinite horizon MDP.
MVPI enjoys great flexibility in that any policy evaluation method and risk-neutral control method can be dropped in for risk-averse control off the shelf.
This flexibility reduces the gap between risk-neutral control and risk-averse control and is achieved by working on a novel augmented MDP.
arXiv Detail & Related papers (2020-04-22T22:23:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.