Maximum Entropy RL (Provably) Solves Some Robust RL Problems
- URL: http://arxiv.org/abs/2103.06257v1
- Date: Wed, 10 Mar 2021 18:45:48 GMT
- Title: Maximum Entropy RL (Provably) Solves Some Robust RL Problems
- Authors: Benjamin Eysenbach and Sergey Levine
- Abstract summary: We prove theoretically that standard maximum entropy RL is robust to some disturbances in the dynamics and the reward function.
Our results suggest that MaxEnt RL by itself is robust to certain disturbances, without requiring any additional modifications.
- Score: 94.80212602202518
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many potential applications of reinforcement learning (RL) require guarantees
that the agent will perform well in the face of disturbances to the dynamics or
reward function. In this paper, we prove theoretically that standard maximum
entropy RL is robust to some disturbances in the dynamics and the reward
function. While this capability of MaxEnt RL has been observed empirically in
prior work, to the best of our knowledge our work provides the first rigorous
proof and theoretical characterization of the MaxEnt RL robust set. While a
number of prior robust RL algorithms have been designed to handle similar
disturbances to the reward function or dynamics, these methods typically
require adding additional moving parts and hyperparameters on top of a base RL
algorithm. In contrast, our theoretical results suggest that MaxEnt RL by
itself is robust to certain disturbances, without requiring any additional
modifications. While this does not imply that MaxEnt RL is the best available
robust RL method, MaxEnt RL does possess a striking simplicity and appealing
formal guarantees.
Related papers
- To the Max: Reinventing Reward in Reinforcement Learning [1.5498250598583487]
In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance.
We introduce textitmax-reward RL, where an agent optimize the maximum rather than the cumulative reward.
In experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium-Robotics.
arXiv Detail & Related papers (2024-02-02T12:29:18Z) - Understanding the Synergies between Quality-Diversity and Deep
Reinforcement Learning [4.788163807490196]
Generalized Actor-Critic QD-RL is a unified modular framework for actor-critic deep RL methods in the QD-RL setting.
We introduce two new algorithms, PGA-ME (SAC) and PGA-ME (DroQ) which apply recent advancements in Deep RL to the QD-RL setting.
arXiv Detail & Related papers (2023-03-10T19:02:42Z) - Train Hard, Fight Easy: Robust Meta Reinforcement Learning [78.16589993684698]
A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients.
Standard MRL methods optimize the average return over tasks, but often suffer from poor results in tasks of high risk or difficulty.
In this work, we define a robust MRL objective with a controlled level.
The data inefficiency is addressed via the novel Robust Meta RL algorithm (RoML)
arXiv Detail & Related papers (2023-01-26T14:54:39Z) - Extreme Q-Learning: MaxEnt RL without Entropy [88.97516083146371]
Modern Deep Reinforcement Learning (RL) algorithms require estimates of the maximal Q-value, which are difficult to compute in continuous domains.
We introduce a new update rule for online and offline RL which directly models the maximal value using Extreme Value Theory (EVT)
Using EVT, we derive our Extreme Q-Learning framework and consequently online and, for the first time, offline MaxEnt Q-learning algorithms.
arXiv Detail & Related papers (2023-01-05T23:14:38Z) - LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement
Learning [78.2286146954051]
LCRL implements model-free Reinforcement Learning (RL) algorithms over unknown Decision Processes (MDPs)
We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL.
arXiv Detail & Related papers (2022-09-21T13:21:00Z) - Recurrent Model-Free RL is a Strong Baseline for Many POMDPs [73.39666827525782]
Many problems in RL, such as meta RL, robust RL, and generalization in RL, can be cast as POMDPs.
In theory, simply augmenting model-free RL with memory, such as recurrent neural networks, provides a general approach to solving all types of POMDPs.
Prior work has found that such recurrent model-free RL methods tend to perform worse than more specialized algorithms that are designed for specific types of POMDPs.
arXiv Detail & Related papers (2021-10-11T07:09:14Z) - A Simple Reward-free Approach to Constrained Reinforcement Learning [33.813302183231556]
This paper bridges reward-free RL and constrained RL. Particularly, we propose a simple meta-algorithm such that given any reward-free RL oracle, the approachability and constrained RL problems can be directly solved with negligible overheads in sample complexity.
arXiv Detail & Related papers (2021-07-12T06:27:30Z) - Combining Pessimism with Optimism for Robust and Efficient Model-Based
Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time.
To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations.
We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z) - Auto-Agent-Distiller: Towards Efficient Deep Reinforcement Learning
Agents via Neural Architecture Search [14.292072505007974]
We propose an Auto-Agent-Distiller (A2D) framework to automatically search for the optimal DRL agents for various tasks.
We demonstrate that vanilla NAS can easily fail in searching for the optimal agents, due to its resulting high variance in DRL training stability.
We then develop a novel distillation mechanism to distill the knowledge from both the teacher agent's actor and critic to stabilize the searching process and improve the searched agents' optimality.
arXiv Detail & Related papers (2020-12-24T04:07:36Z) - HTMRL: Biologically Plausible Reinforcement Learning with Hierarchical
Temporal Memory [1.138723572165938]
We present HTMRL, the first strictly HTM-basedReinforcement Learning algorithm.
We empirically and statistically show that HTMRL scales to many states and actions, and demonstrate that HTM's ability for adapting to changing patterns extends to RL.
HTMRL is the first iteration of a novel RL approach, with the potential of extending to a capable algorithm for Meta-RL.
arXiv Detail & Related papers (2020-09-18T15:05:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.