Related papers: Robust Policy Learning over Multiple Uncertainty Sets

Robust Policy Learning over Multiple Uncertainty Sets

URL: http://arxiv.org/abs/2202.07013v1
Date: Mon, 14 Feb 2022 20:06:28 GMT
Title: Robust Policy Learning over Multiple Uncertainty Sets
Authors: Annie Xie, Shagun Sodhani, Chelsea Finn, Joelle Pineau, Amy Zhang
Abstract summary: Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments. We develop an algorithm that enjoys the benefits of both system identification and robust RL.
Score: 91.67120465453179
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments. While system identification methods provide a way to infer the variation from online experience, they can fail in settings where fast identification is not possible. Another dominant approach is robust RL which produces a policy that can handle worst-case scenarios, but these methods are generally designed to achieve robustness to a single uncertainty set that must be specified at train time. Towards a more general solution, we formulate the multi-set robustness problem to learn a policy robust to different perturbation sets. We then design an algorithm that enjoys the benefits of both system identification and robust RL: it reduces uncertainty where possible given a few interactions, but can still act robustly with respect to the remaining uncertainty. On a diverse set of control tasks, our approach demonstrates improved worst-case performance on new environments compared to prior methods based on system identification and on robust RL alone.

Related papers

Learning Predictive Safety Filter via Decomposition of Robust Invariant Set [6.94348936509225]
This paper presents advantages of both RMPC and RL RL to synthesize safety filters for nonlinear systems. We propose a policy approach for robust reach problems and establish its complexity.
arXiv Detail & Related papers (2023-11-12T08:11:28Z)
Safe Reinforcement Learning with Dual Robustness [10.455148541147796]
Reinforcement learning (RL) agents are vulnerable to adversarial disturbances. We propose a systematic framework to unify safe RL and robust RL. We also design a deep RL algorithm for practical implementation, called dually robust actor-critic (DRAC)
arXiv Detail & Related papers (2023-09-13T09:34:21Z)
Approximate Model-Based Shielding for Safe Reinforcement Learning [83.55437924143615]
We propose a principled look-ahead shielding algorithm for verifying the performance of learned RL policies. Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system. We demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
arXiv Detail & Related papers (2023-07-27T15:19:45Z)
Constrained Decision Transformer for Offline Safe Reinforcement Learning [16.485325576173427]
We study the offline safe RL problem from a novel multi-objective optimization perspective. We propose the constrained decision transformer (CDT) approach, which can dynamically adjust the trade-offs during deployment.
arXiv Detail & Related papers (2023-02-14T21:27:10Z)
Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent. Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control. The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z)
Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem. We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z)
Enforcing robust control guarantees within neural network policies [76.00287474159973]
We propose a generic nonlinear control policy class, parameterized by neural networks, that enforces the same provable robustness criteria as robust control. We demonstrate the power of this approach on several domains, improving in average-case performance over existing robust control methods and in worst-case stability over (non-robust) deep RL methods.
arXiv Detail & Related papers (2020-11-16T17:14:59Z)
Online Safety Assurance for Deep Reinforcement Learning [24.23670300606769]
We argue that safely deploying learning-driven systems requires being able to determine, in real time, whether system behavior is coherent. We present three approaches to quantifying decision uncertainty that differ in terms of the signal used to infer uncertainty. Our preliminary findings suggest that transitioning to a default policy when decision uncertainty is detected is key to enjoying the performance benefits afforded by leveraging ML without compromising safety.
arXiv Detail & Related papers (2020-10-07T19:54:01Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
Falsification-Based Robust Adversarial Reinforcement Learning [13.467693018395863]
falsification-based RARL (FRARL) is the first generic framework for integrating temporal logic falsification in adversarial learning to improve policy robustness. Our experimental results demonstrate that policies trained with a falsification-based adversary generalize better and show less violation of the safety specification in test scenarios.
arXiv Detail & Related papers (2020-07-01T18:32:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.