Risk-Averse Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2102.05371v1
- Date: Wed, 10 Feb 2021 10:27:49 GMT
- Title: Risk-Averse Offline Reinforcement Learning
- Authors: N\'uria Armengol Urp\'i, Sebastian Curi, Andreas Krause
- Abstract summary: Training Reinforcement Learning (RL) agents in high-stakes applications might be too prohibitive due to the risk associated to exploration.
We present the Offline Risk-Averse Actor-Critic (O-RAAC), a model-free RL algorithm that is able to learn risk-averse policies in a fully offline setting.
- Score: 46.383648750385575
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training Reinforcement Learning (RL) agents in high-stakes applications might
be too prohibitive due to the risk associated to exploration. Thus, the agent
can only use data previously collected by safe policies. While previous work
considers optimizing the average performance using offline data, we focus on
optimizing a risk-averse criteria, namely the CVaR. In particular, we present
the Offline Risk-Averse Actor-Critic (O-RAAC), a model-free RL algorithm that
is able to learn risk-averse policies in a fully offline setting. We show that
O-RAAC learns policies with higher CVaR than risk-neutral approaches in
different robot control tasks. Furthermore, considering risk-averse criteria
guarantees distributional robustness of the average performance with respect to
particular distribution shifts. We demonstrate empirically that in the presence
of natural distribution-shifts, O-RAAC learns policies with good average
performance.
Related papers
- Stationary Policies are Optimal in Risk-averse Total-reward MDPs with EVaR [12.719528972742394]
We show that the risk-averse em total reward criterion can be optimized by a stationary policy.
Our results indicate that the total reward criterion may be preferable to the discounted criterion in a broad range of risk-averse reinforcement learning domains.
arXiv Detail & Related papers (2024-08-30T13:33:18Z) - Distributional Reinforcement Learning with Online Risk-awareness
Adaption [5.363478475460403]
We introduce a novel framework, Distributional RL with Online Risk Adaption (DRL-ORA)
DRL-ORA dynamically selects the epistemic risk levels via solving a total variation minimization problem online.
We show multiple classes of tasks where DRL-ORA outperforms existing methods that rely on either a fixed risk level or manually predetermined risk level.
arXiv Detail & Related papers (2023-10-08T14:32:23Z) - Provably Efficient Iterated CVaR Reinforcement Learning with Function
Approximation and Human Feedback [57.6775169085215]
Risk-sensitive reinforcement learning aims to optimize policies that balance the expected reward and risk.
We present a novel framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations.
We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis.
arXiv Detail & Related papers (2023-07-06T08:14:54Z) - Safe Deployment for Counterfactual Learning to Rank with Exposure-Based
Risk Minimization [63.93275508300137]
We introduce a novel risk-aware Counterfactual Learning To Rank method with theoretical guarantees for safe deployment.
Our experimental results demonstrate the efficacy of our proposed method, which is effective at avoiding initial periods of bad performance when little data is available.
arXiv Detail & Related papers (2023-04-26T15:54:23Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z) - Improving Robustness via Risk Averse Distributional Reinforcement
Learning [13.467017642143581]
Robustness is critical when the policies are trained in simulations instead of real world environment.
We propose a risk-aware algorithm to learn robust policies in order to bridge the gap between simulation training and real-world implementation.
arXiv Detail & Related papers (2020-05-01T20:03:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.