Combining Pessimism with Optimism for Robust and Efficient Model-Based
Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2103.10369v1
- Date: Thu, 18 Mar 2021 16:50:17 GMT
- Title: Combining Pessimism with Optimism for Robust and Efficient Model-Based
Deep Reinforcement Learning
- Authors: Sebastian Curi, Ilija Bogunovic, Andreas Krause
- Abstract summary: In real-world tasks, reinforcement learning agents encounter situations that are not present during training time.
To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations.
We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
- Score: 56.17667147101263
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In real-world tasks, reinforcement learning (RL) agents frequently encounter
situations that are not present during training time. To ensure reliable
performance, the RL agents need to exhibit robustness against worst-case
situations. The robust RL framework addresses this challenge via a worst-case
optimization between an agent and an adversary. Previous robust RL algorithms
are either sample inefficient, lack robustness guarantees, or do not scale to
large problems. We propose the Robust Hallucinated Upper-Confidence RL
(RH-UCRL) algorithm to provably solve this problem while attaining near-optimal
sample complexity guarantees. RH-UCRL is a model-based reinforcement learning
(MBRL) algorithm that effectively distinguishes between epistemic and aleatoric
uncertainty and efficiently explores both the agent and adversary decision
spaces during policy learning. We scale RH-UCRL to complex tasks via neural
networks ensemble models as well as neural network policies. Experimentally, we
demonstrate that RH-UCRL outperforms other robust deep RL algorithms in a
variety of adversarial environments.
Related papers
- Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach [2.3020018305241337]
This paper is the first to propose considering the RRL problems within the positional differential game theory.
Namely, we prove that under Isaacs's condition, the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations.
We present the Isaacs Deep Q-Network algorithms and demonstrate their superiority compared to other baseline RRL and Multi-Agent RL algorithms in various environments.
arXiv Detail & Related papers (2024-05-03T12:21:43Z) - Hybrid Inverse Reinforcement Learning [34.793570631021005]
inverse reinforcement learning approach to imitation learning is a double-edged sword.
We propose using hybrid RL -- training on a mixture of online and expert data -- to curtail unnecessary exploration.
We derive both model-free and model-based hybrid inverse RL algorithms with strong policy performance guarantees.
arXiv Detail & Related papers (2024-02-13T23:29:09Z) - Provably Efficient Iterated CVaR Reinforcement Learning with Function
Approximation and Human Feedback [57.6775169085215]
Risk-sensitive reinforcement learning aims to optimize policies that balance the expected reward and risk.
We present a novel framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations.
We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis.
arXiv Detail & Related papers (2023-07-06T08:14:54Z) - Train Hard, Fight Easy: Robust Meta Reinforcement Learning [78.16589993684698]
A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients.
Standard MRL methods optimize the average return over tasks, but often suffer from poor results in tasks of high risk or difficulty.
In this work, we define a robust MRL objective with a controlled level.
The data inefficiency is addressed via the novel Robust Meta RL algorithm (RoML)
arXiv Detail & Related papers (2023-01-26T14:54:39Z) - Efficient Adversarial Training without Attacking: Worst-Case-Aware
Robust Reinforcement Learning [14.702446153750497]
Worst-case-aware Robust RL (WocaR-RL) is a robust training framework for deep reinforcement learning.
We show that WocaR-RL achieves state-of-the-art performance under various strong attacks.
arXiv Detail & Related papers (2022-10-12T05:24:46Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Robust Reinforcement Learning using Offline Data [23.260211453437055]
We propose a robust reinforcement learning algorithm called Robust Fitted Q-Iteration (RFQI)
RFQI uses only an offline dataset to learn the optimal robust policy.
We prove that RFQI learns a near-optimal robust policy under standard assumptions.
arXiv Detail & Related papers (2022-08-10T03:47:45Z) - Robust Reinforcement Learning as a Stackelberg Game via
Adaptively-Regularized Adversarial Training [43.97565851415018]
Robust Reinforcement Learning (RL) focuses on improving performances under model errors or adversarial attacks.
Most of the existing literature models RARL as a zero-sum simultaneous game with Nash equilibrium as the solution concept.
We introduce a novel hierarchical formulation of robust RL - a general-sum Stackelberg game model called RRL-Stack.
arXiv Detail & Related papers (2022-02-19T03:44:05Z) - Semantic-Aware Collaborative Deep Reinforcement Learning Over Wireless
Cellular Networks [82.02891936174221]
Collaborative deep reinforcement learning (CDRL) algorithms in which multiple agents can coordinate over a wireless network is a promising approach.
In this paper, a novel semantic-aware CDRL method is proposed to enable a group of untrained agents with semantically-linked DRL tasks to collaborate efficiently across a resource-constrained wireless cellular network.
arXiv Detail & Related papers (2021-11-23T18:24:47Z) - Efficient Model-Based Reinforcement Learning through Optimistic Policy
Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms.
Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.