Recurrent Model-Free RL is a Strong Baseline for Many POMDPs
- URL: http://arxiv.org/abs/2110.05038v1
- Date: Mon, 11 Oct 2021 07:09:14 GMT
- Title: Recurrent Model-Free RL is a Strong Baseline for Many POMDPs
- Authors: Tianwei Ni, Benjamin Eysenbach, Ruslan Salakhutdinov
- Abstract summary: Many problems in RL, such as meta RL, robust RL, and generalization in RL, can be cast as POMDPs.
In theory, simply augmenting model-free RL with memory, such as recurrent neural networks, provides a general approach to solving all types of POMDPs.
Prior work has found that such recurrent model-free RL methods tend to perform worse than more specialized algorithms that are designed for specific types of POMDPs.
- Score: 73.39666827525782
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many problems in RL, such as meta RL, robust RL, and generalization in RL,
can be cast as POMDPs. In theory, simply augmenting model-free RL with memory,
such as recurrent neural networks, provides a general approach to solving all
types of POMDPs. However, prior work has found that such recurrent model-free
RL methods tend to perform worse than more specialized algorithms that are
designed for specific types of POMDPs. This paper revisits this claim. We find
that careful architecture and hyperparameter decisions yield a recurrent
model-free implementation that performs on par with (and occasionally
substantially better than) more sophisticated recent techniques in their
respective domains. We also release a simple and efficient implementation of
recurrent model-free RL for future work to use as a baseline for POMDPs. Code
is available at https://github.com/twni2016/pomdp-baselines
Related papers
- REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.
In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.
We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$ [12.111848705677142]
We propose RL$3$, a hybrid approach that incorporates action-values, learned per task through traditional RL, in the inputs to meta-RL.
We show that RL$3$ earns greater cumulative reward in the long term, compared to RL$2$, while maintaining data-efficiency in the short term, and generalizes better to out-of-distribution tasks.
arXiv Detail & Related papers (2023-06-28T04:16:16Z) - Semi-Markov Offline Reinforcement Learning for Healthcare [57.15307499843254]
We introduce three offline RL algorithms, namely, SDQN, SDDQN, and SBCQ.
We experimentally demonstrate that only these algorithms learn the optimal policy in variable-time environments.
We apply our new algorithms to a real-world offline dataset pertaining to warfarin dosing for stroke prevention.
arXiv Detail & Related papers (2022-03-17T14:51:21Z) - Reinforcement Learning as One Big Sequence Modeling Problem [84.84564880157149]
Reinforcement learning (RL) is typically concerned with estimating single-step policies or single-step models.
We view RL as a sequence modeling problem, with the goal being to predict a sequence of actions that leads to a sequence of high rewards.
arXiv Detail & Related papers (2021-06-03T17:58:51Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z) - MOReL : Model-Based Offline Reinforcement Learning [49.30091375141527]
In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment.
We present MOReL, an algorithmic framework for model-based offline RL.
We show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks.
arXiv Detail & Related papers (2020-05-12T17:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.