Related papers: Maximum Entropy RL (Provably) Solves Some Robust RL Problems

Maximum Entropy RL (Provably) Solves Some Robust RL Problems

URL: http://arxiv.org/abs/2103.06257v1
Date: Wed, 10 Mar 2021 18:45:48 GMT
Title: Maximum Entropy RL (Provably) Solves Some Robust RL Problems
Authors: Benjamin Eysenbach and Sergey Levine
Abstract summary: We prove theoretically that standard maximum entropy RL is robust to some disturbances in the dynamics and the reward function. Our results suggest that MaxEnt RL by itself is robust to certain disturbances, without requiring any additional modifications.
Score: 94.80212602202518
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many potential applications of reinforcement learning (RL) require guarantees that the agent will perform well in the face of disturbances to the dynamics or reward function. In this paper, we prove theoretically that standard maximum entropy RL is robust to some disturbances in the dynamics and the reward function. While this capability of MaxEnt RL has been observed empirically in prior work, to the best of our knowledge our work provides the first rigorous proof and theoretical characterization of the MaxEnt RL robust set. While a number of prior robust RL algorithms have been designed to handle similar disturbances to the reward function or dynamics, these methods typically require adding additional moving parts and hyperparameters on top of a base RL algorithm. In contrast, our theoretical results suggest that MaxEnt RL by itself is robust to certain disturbances, without requiring any additional modifications. While this does not imply that MaxEnt RL is the best available robust RL method, MaxEnt RL does possess a striking simplicity and appealing formal guarantees.

Related papers

Maximizing Confidence Alone Improves Reasoning [48.83927980325788]
RENT: Reinforcement Learning via Entropy Minimization is a fully unsupervised RL method that requires no external reward or ground-truth answers.<n>We find that by reinforcing the chains of thought that yield high model confidence on its generated answers, the model improves its reasoning ability.
arXiv Detail & Related papers (2025-05-28T17:59:37Z)
Maximum Entropy Reinforcement Learning with Diffusion Policy [24.889485955864547]
In this paper, we employ the diffusion model, a powerful generative model capable of capturing complex multimodal distributions, as the policy representation to fulfill the MaxEnt RL objective. Our method enables efficient exploration and brings the policy closer to the optimal MaxEnt policy.
arXiv Detail & Related papers (2025-02-17T09:55:58Z)
DIME:Diffusion-Based Maximum Entropy Reinforcement Learning [37.420420953705396]
Maximum entropy reinforcement learning (MaxEnt-RL) has become the standard approach to RL due to its beneficial exploration properties. We propose Diffusion-Based Maximum Entropy RL (DIME) to overcome the intractability of computing their marginal entropy.
arXiv Detail & Related papers (2025-02-04T13:37:14Z)
To the Max: Reinventing Reward in Reinforcement Learning [1.5498250598583487]
In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance. We introduce textitmax-reward RL, where an agent optimize the maximum rather than the cumulative reward. In experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium-Robotics.
arXiv Detail & Related papers (2024-02-02T12:29:18Z)
Understanding the Synergies between Quality-Diversity and Deep Reinforcement Learning [4.788163807490196]
Generalized Actor-Critic QD-RL is a unified modular framework for actor-critic deep RL methods in the QD-RL setting. We introduce two new algorithms, PGA-ME (SAC) and PGA-ME (DroQ) which apply recent advancements in Deep RL to the QD-RL setting.
arXiv Detail & Related papers (2023-03-10T19:02:42Z)
Train Hard, Fight Easy: Robust Meta Reinforcement Learning [78.16589993684698]
A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients. Standard MRL methods optimize the average return over tasks, but often suffer from poor results in tasks of high risk or difficulty. In this work, we define a robust MRL objective with a controlled level. The data inefficiency is addressed via the novel Robust Meta RL algorithm (RoML)
arXiv Detail & Related papers (2023-01-26T14:54:39Z)
Extreme Q-Learning: MaxEnt RL without Entropy [88.97516083146371]
Modern Deep Reinforcement Learning (RL) algorithms require estimates of the maximal Q-value, which are difficult to compute in continuous domains. We introduce a new update rule for online and offline RL which directly models the maximal value using Extreme Value Theory (EVT) Using EVT, we derive our Extreme Q-Learning framework and consequently online and, for the first time, offline MaxEnt Q-learning algorithms.
arXiv Detail & Related papers (2023-01-05T23:14:38Z)
LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning [78.2286146954051]
LCRL implements model-free Reinforcement Learning (RL) algorithms over unknown Decision Processes (MDPs) We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL.
arXiv Detail & Related papers (2022-09-21T13:21:00Z)
Recurrent Model-Free RL is a Strong Baseline for Many POMDPs [73.39666827525782]
Many problems in RL, such as meta RL, robust RL, and generalization in RL, can be cast as POMDPs. In theory, simply augmenting model-free RL with memory, such as recurrent neural networks, provides a general approach to solving all types of POMDPs. Prior work has found that such recurrent model-free RL methods tend to perform worse than more specialized algorithms that are designed for specific types of POMDPs.
arXiv Detail & Related papers (2021-10-11T07:09:14Z)
A Simple Reward-free Approach to Constrained Reinforcement Learning [33.813302183231556]
This paper bridges reward-free RL and constrained RL. Particularly, we propose a simple meta-algorithm such that given any reward-free RL oracle, the approachability and constrained RL problems can be directly solved with negligible overheads in sample complexity.
arXiv Detail & Related papers (2021-07-12T06:27:30Z)
Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time. To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations. We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z)
Auto-Agent-Distiller: Towards Efficient Deep Reinforcement Learning Agents via Neural Architecture Search [14.292072505007974]
We propose an Auto-Agent-Distiller (A2D) framework to automatically search for the optimal DRL agents for various tasks. We demonstrate that vanilla NAS can easily fail in searching for the optimal agents, due to its resulting high variance in DRL training stability. We then develop a novel distillation mechanism to distill the knowledge from both the teacher agent's actor and critic to stabilize the searching process and improve the searched agents' optimality.
arXiv Detail & Related papers (2020-12-24T04:07:36Z)
HTMRL: Biologically Plausible Reinforcement Learning with Hierarchical Temporal Memory [1.138723572165938]
We present HTMRL, the first strictly HTM-basedReinforcement Learning algorithm. We empirically and statistically show that HTMRL scales to many states and actions, and demonstrate that HTM's ability for adapting to changing patterns extends to RL. HTMRL is the first iteration of a novel RL approach, with the potential of extending to a capable algorithm for Meta-RL.
arXiv Detail & Related papers (2020-09-18T15:05:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.